This final project examines the relations between several text sentiment-derived features from PPG Paints sales representatives’ reports of interactions with customers and (1) the amount of time sales reps spend on a product and (2) whether the product achieves its sales target. Text sentiment, in this context, refers to the extent to which words, phrases, sentences, and/or paragraphs of text are “positive” (e.g., “I love this paint color!”) or “negative” (e.g., “I’m concerned about the price.”). More information on the dataset is provided below.
There are two broad goals:
In this document, we start by simply exploring the data!
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5.9000 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(sjPlot)
library(knitr)
library(corrplot)
## corrplot 0.90 loaded
Variable descriptions:
region: anonymized, global region of the customer
purchasing the product (categorical)customer: anonymized indicator of the company
purchasing the product (categorical)xb_[##]: Sentiment-derived features from the
Bing lexicon (continuous)xn_[##]: Sentiment-derived features from the
NRC lexicon (continuous)xa_[##]: Sentiment-derived features from the
AFINN lexicon (continuous)xw_[##]: Word count sentiment-derived features
(continuous)xs_[##]: sentimentr derived features
(continuous)response: Average hours per week associated with a
product sold to a customer (continuous)outcome: Whether a product achieved its sales goal,
where outcome = event means that the product
did NOT achieve its goal (categorical)There are several sentiment derived features for
each lexicon (33 total features). For instance, there are 3 inputs
associated with the word count sentiment-derived features,
xw_01, xw_02, and xw_03, and
there are 8 inputs associated with the Bing lexicon, xb_01,
…, xb_08. Importantly, these input values reflect the
polarity of the sentiment of , i.e, the extent to which the
sentiment is positive or negative. Positive values indicate
positive sentiment, negative values indicate negative sentiment, and 0
indicates neutral sentiment. Additionally, the absolute value of these
values represent the strength of these emotions, where greater values
represent stronger positive or negative sentiment.
Below is a glance at the dataset; specifically, we see the variable names, types, and the first few observations. Each row of data corresponds to a single product sold to a customer.
setwd("~/Desktop/Courses/Spring 2022 - Machine Learning/Final/")
df <- read.csv("final_project_train.csv")
str(df)
## 'data.frame': 677 obs. of 38 variables:
## $ rowid : int 1 3 4 5 8 9 11 14 15 16 ...
## $ region : chr "XX" "XX" "XX" "XX" ...
## $ customer: chr "B" "B" "B" "B" ...
## $ xb_01 : num 4 1 2 2.52 2.55 ...
## $ xb_02 : int 4 1 2 11 6 6 10 12 9 10 ...
## $ xb_03 : int 4 1 2 -6 -1 1 -4 -4 -2 -4 ...
## $ xn_01 : num 3 2 2 1.533 0.839 ...
## $ xn_02 : int 3 2 4 9 3 8 6 10 10 4 ...
## $ xn_03 : int 3 2 0 -3 -4 -2 -5 -6 -3 -5 ...
## $ xa_01 : num 12 3 9 7.08 6.45 ...
## $ xa_02 : int 12 3 9 29 17 18 24 27 20 19 ...
## $ xa_03 : int 12 3 9 -7 -2 2 -9 -5 -3 -3 ...
## $ xb_04 : num 1.333 1 1 0.895 1.225 ...
## $ xb_05 : num 1.33 1 1 -2 -0.5 ...
## $ xb_06 : num 1.33 1 1 4 4 ...
## $ xb_07 : num 4 1 2 1.93 1.97 ...
## $ xb_08 : num -1 1 0 -0.08 0.355 ...
## $ xn_04 : num 1 2 1 0.527 0.469 ...
## $ xn_05 : num 1 2 0 -1 -1.33 ...
## $ xn_06 : num 1 2 2 2.5 3 2 4 4 3 2 ...
## $ xn_07 : num 3 2 2.5 1.49 1.23 ...
## $ xn_08 : num -1 2 -1 -0.44 -0.452 ...
## $ xa_04 : num 6 3 6.75 2.43 3.02 ...
## $ xa_05 : num 6 3 4.5 -3.5 -0.667 ...
## $ xa_06 : num 6 3 9 9 13 6 16 14 6 6 ...
## $ xa_07 : num 9 3 7.5 4.47 4.61 ...
## $ xa_08 : num 3 3 6 0.707 1.323 ...
## $ xw_01 : num 23 17 52.5 64.5 54.8 ...
## $ xw_02 : int 23 17 48 0 12 15 0 0 0 7 ...
## $ xw_03 : int 23 17 57 106 105 101 107 109 109 104 ...
## $ xs_01 : num 0.262 0.331 0.24 0.142 0.244 ...
## $ xs_02 : num 0.262 0.331 0.19 -0.733 -0.122 ...
## $ xs_03 : num 0.262 0.331 0.289 0.55 1.313 ...
## $ xs_04 : num 0.538 0.429 0.368 0.287 0.238 ...
## $ xs_05 : num 0.5376 0.4287 0.2485 0 0.0434 ...
## $ xs_06 : num 0.538 0.429 0.487 0.636 0.433 ...
## $ response: num 2.62 1.18 2.22 2.73 1.48 ...
## $ outcome : chr "non_event" "non_event" "event" "non_event" ...
There is an unequal number of observations per region. Specifically, region ZZ has the greatest number of observations and region XX has the lowest number.
df %>% count(region)
## region n
## 1 XX 161
## 2 YY 222
## 3 ZZ 294
df %>% ggplot(aes(x = region)) +
geom_bar()
Similarly, there is an unequal number of observations per customer. The “other” group of customers has the largest number of observations, followed by customer G. Customer D has the lowest number of observations.
df %>% count(customer)
## customer n
## 1 A 55
## 2 B 52
## 3 D 32
## 4 E 35
## 5 G 113
## 6 K 38
## 7 M 71
## 8 Other 245
## 9 Q 36
df %>% ggplot(aes(x = customer)) +
geom_bar()
df_continous_inputs <- df %>% dplyr::select(starts_with("x"))
df_continous_inputs_summary <- df_continous_inputs %>%
psych::describe() %>%
as.data.frame() %>%
dplyr::select(n, mean, sd, median, min, max, skew, kurtosis, se)
There are 8 sentiment-derived features associated with the Bing lexicon. The distributions of these variables are Gaussian-like. Across features, most of the mean sentiment values are positive, suggesting that sales reps’ reports tended to include positive words.
kable(filter(df_continous_inputs_summary,
grepl("xb", row.names(df_continous_inputs_summary))), #include only rows that start with "xb"
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xb_01 | 677 | 3.38 | 2.02 | 3.25 | -4 | 14 | 0.56 | 2.65 | 0.08 |
| xb_02 | 677 | 5.75 | 3.31 | 6.00 | -4 | 15 | 0.04 | -0.33 | 0.13 |
| xb_03 | 677 | 1.22 | 3.01 | 1.00 | -7 | 14 | 0.51 | 0.63 | 0.12 |
| xb_04 | 677 | 1.15 | 0.69 | 1.14 | -2 | 5 | 0.76 | 5.08 | 0.03 |
| xb_05 | 677 | 0.41 | 1.07 | 0.40 | -3 | 5 | 0.34 | 1.04 | 0.04 |
| xb_06 | 677 | 2.11 | 1.43 | 2.00 | -2 | 9 | 0.98 | 1.97 | 0.05 |
| xb_07 | 677 | 2.10 | 0.86 | 2.00 | -1 | 7 | 0.92 | 4.32 | 0.03 |
| xb_08 | 677 | 0.21 | 0.96 | 0.21 | -4 | 5 | 0.34 | 3.51 | 0.04 |
input_names <- df %>% select(starts_with("xb")) %>% colnames()
df %>%
select(all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid")) %>%
ggplot(mapping = aes(x = value)) +
geom_histogram(bins = 20) +
facet_wrap(~name, scales = "free") +
theme_bw()
There are 8 sentiment-derived features associated with the NRC lexicon. The distributions of these variables are Gaussian-like. Across features, there is a mixture of positive- and negative-leaning mean sentiment values (range is from -.40 to 3.66). This is interesting, as the Bing lexicon features were, on average, mostly positive. However, the negative mean sentiment values are closer to 0 in absolute value than the positive values, suggesting that the valence of the words in the sales reps’ reports are relatively more neutral or positive than negative according to this lexicon.
kable(filter(df_continous_inputs_summary,
grepl("xn", row.names(df_continous_inputs_summary))), #include only rows that start with "xn"
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xn_01 | 677 | 1.56 | 1.76 | 1.60 | -4 | 10 | 0.27 | 2.59 | 0.07 |
| xn_02 | 677 | 3.66 | 2.96 | 4.00 | -4 | 13 | 0.11 | -0.03 | 0.11 |
| xn_03 | 677 | -0.40 | 2.67 | -1.00 | -7 | 10 | 0.39 | 0.40 | 0.10 |
| xn_04 | 677 | 0.60 | 0.73 | 0.60 | -4 | 5 | 0.32 | 6.31 | 0.03 |
| xn_05 | 677 | -0.16 | 1.09 | -0.25 | -4 | 5 | 0.35 | 1.24 | 0.04 |
| xn_06 | 677 | 1.48 | 1.32 | 1.25 | -4 | 7 | 0.75 | 1.90 | 0.05 |
| xn_07 | 677 | 1.41 | 0.78 | 1.40 | -4 | 5 | -0.29 | 6.45 | 0.03 |
| xn_08 | 677 | -0.27 | 1.01 | -0.31 | -4 | 5 | 0.39 | 2.72 | 0.04 |
input_names <- df %>% select(starts_with("xn")) %>% colnames()
df %>%
select(all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid")) %>%
ggplot(mapping = aes(x = value)) +
geom_histogram(bins = 20) +
facet_wrap(~name, scales = "free") +
theme_bw()
There are 8 sentiment-derived features associated with the AFINN lexicon. The distributions of these variables are Gaussian-like. Across features, the mean sentiment values are positive, suggesting that sales reps’ reports tended to include positive words; this observation is similar to what we saw with the Bing sentiment values. Also, while the lower bound of these AFINN input values (min = -9) are similar to the sentiment-derived inputs from the Bing and NRC lexicons (min = -7), the upper bound is much greater. The max sentiment value of the variables from the two previous lexicons is 15 while the max value is 38 for AFINN. This likely reflects differences in how the sentiment analyses were conducted and how the feature values were calculated.
kable(filter(df_continous_inputs_summary,
grepl("xa", row.names(df_continous_inputs_summary))), #include only rows that start with "xa"
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xa_01 | 677 | 8.07 | 3.92 | 8.00 | -3 | 35 | 1.04 | 5.14 | 0.15 |
| xa_02 | 677 | 13.24 | 7.01 | 13.00 | -3 | 38 | 0.27 | -0.20 | 0.27 |
| xa_03 | 677 | 3.84 | 5.59 | 3.00 | -9 | 35 | 0.91 | 2.09 | 0.22 |
| xa_04 | 677 | 2.94 | 1.41 | 2.93 | -2 | 12 | 1.07 | 5.61 | 0.05 |
| xa_05 | 677 | 1.38 | 2.23 | 1.33 | -8 | 12 | 0.23 | 2.08 | 0.09 |
| xa_06 | 677 | 5.15 | 3.35 | 4.33 | -2 | 23 | 1.40 | 3.14 | 0.13 |
| xa_07 | 677 | 4.70 | 1.70 | 4.61 | -2 | 13 | 0.88 | 3.86 | 0.07 |
| xa_08 | 677 | 1.22 | 1.89 | 1.14 | -5 | 12 | 0.69 | 4.27 | 0.07 |
input_names <- df %>% select(starts_with("xa")) %>% colnames()
df %>%
select(all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid")) %>%
ggplot(mapping = aes(x = value)) +
geom_histogram(bins = 20) +
facet_wrap(~name, scales = "free") +
theme_bw()
There are 3 sentiment-derived features associated with word counts.
The distributions of these variables are not Gaussian-like, with the
exception of xw_01. The values of xw_02 are
skewed to the left and the values of xw_03 are skewed to
the right. Unlike the Bing, NRC, and AFINN features, the word count
sentiment-derived features are related to the number of words (and not
the polarity of words), so they are lower bounded at 0 (because we can’t
have a negative number of words!).
kable(filter(df_continous_inputs_summary,
grepl("xw", row.names(df_continous_inputs_summary))), #include only rows that start with "xw"
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xw_01 | 677 | 57.02 | 20.23 | 57.41 | 9 | 108 | 0.03 | -0.16 | 0.78 |
| xw_02 | 677 | 31.87 | 29.26 | 24.00 | 0 | 108 | 0.84 | -0.28 | 1.12 |
| xw_03 | 677 | 79.07 | 27.67 | 93.00 | 9 | 113 | -0.88 | -0.55 | 1.06 |
input_names <- df %>% select(starts_with("xw")) %>% colnames()
df %>%
select(all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid")) %>%
ggplot(mapping = aes(x = value)) +
geom_histogram(bins = 20) +
facet_wrap(~name, scales = "free") +
theme_bw()
sentimentr valuesThere are 6 sentiment-derived features associated with the
‘sentimentr’ package. The distributions of some of these
variables, specifically xs_01 and xs_04, are
Gaussian-like. xs_02 appears to be slightly skewed to the
right, xs_03 and xs_06 are slightly skewed to
the left, and xs_05 is very skewed to the left. Across
features, the mean sentiment values are close to 0 but positive,
suggesting that sales reps’ reports tended to include positive to
neutral words. Interestingly, the range of these sentiment values appear
to be narrower than what we’ve seen before. Across all
sentimentr features, the minimum value is -.90 and the
maximum value is 1.79. Again, this likely reflects differences in how
the sentiment analyses were conducted and how the feature values were
calculated.
kable(filter(df_continous_inputs_summary,
grepl("xs", row.names(df_continous_inputs_summary))), #include only rows that start with "xs"
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xs_01 | 677 | 0.21 | 0.14 | 0.22 | -0.36 | 0.75 | -0.02 | 2.04 | 0.01 |
| xs_02 | 677 | 0.02 | 0.25 | 0.04 | -0.90 | 0.69 | -0.33 | 0.19 | 0.01 |
| xs_03 | 677 | 0.42 | 0.29 | 0.39 | -0.36 | 1.79 | 0.75 | 1.31 | 0.01 |
| xs_04 | 677 | 0.30 | 0.11 | 0.29 | 0.00 | 0.90 | 1.01 | 2.95 | 0.00 |
| xs_05 | 677 | 0.19 | 0.14 | 0.16 | 0.00 | 0.90 | 1.07 | 1.25 | 0.01 |
| xs_06 | 677 | 0.47 | 0.23 | 0.43 | 0.00 | 1.31 | 0.74 | 0.52 | 0.01 |
input_names <- df %>% select(starts_with("xs")) %>% colnames()
df %>%
select(all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid")) %>%
ggplot(mapping = aes(x = value)) +
geom_histogram(bins = 20) +
facet_wrap(~name, scales = "free") +
theme_bw()
There is a clear imbalance of the outcome occurrence.
The event value represents when a product did not meet its
sales objective and thus, it seems that sales reps tended to achieve
their sales goals with each product.
df %>% count(outcome)
## outcome n
## 1 event 127
## 2 non_event 550
df %>% ggplot(aes(x = outcome)) +
geom_bar()
Below we see the distributions of the response variable,
which reflects the average hours per week that sales reps spent engaging
with a product and customer, and its log-transformed values. Since the
response is bounded at 0 hours, we applied a natural log-transformation
to be used in our models later on. Both distributions appear skewed to
the left. On average, sales reps spent 2.68 hours (mean) on interactions
with a customer about a product.
describe_response <- df %>%
select(response) %>%
mutate(log_response = log(response)) %>%
psych::describe() %>%
as.data.frame() %>%
dplyr::select(n, mean, sd, median, min, max, skew, kurtosis, se)
kable(describe_response, digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| response | 677 | 2.68 | 1.75 | 2.29 | 0.57 | 22.92 | 3.62 | 28.81 | 0.07 |
| log_response | 677 | 0.83 | 0.53 | 0.83 | -0.56 | 3.13 | 0.32 | 0.13 | 0.02 |
df %>%
select(response) %>%
mutate(log_response = log(response)) %>%
pivot_longer(cols = c("response", "log_response"),
values_to = "value",
names_to = "variable") %>%
ggplot(mapping = aes(x = value, fill = variable)) +
geom_histogram(binwidth = .33,
alpha=0.6,
position = "identity") +
scale_fill_brewer(palette="Set1")
df_region_continous_inputs <- df %>% dplyr::select(region, starts_with("x"))
df_region_continous_inputs_summary <- df_region_continous_inputs %>%
psych::describeBy(group = "region") #get grouped sum stats
#extract each group's stats
df_region_continous_inputs_summary_XX <- df_region_continous_inputs_summary$XX[-1,] %>%
as.data.frame() %>%
dplyr::select(n, mean, sd, median, min, max, skew, kurtosis, se)
df_region_continous_inputs_summary_YY <- df_region_continous_inputs_summary$YY[-1,] %>%
as.data.frame() %>%
dplyr::select(n, mean, sd, median, min, max, skew, kurtosis, se)
df_region_continous_inputs_summary_ZZ <- df_region_continous_inputs_summary$ZZ[-1,] %>%
as.data.frame() %>%
dplyr::select(n, mean, sd, median, min, max, skew, kurtosis, se)
The continuous variable summary statistics appear similar across region. Specifically, the average Bing sentiment values in each region are generally positive or close to neutral (0).
kable(filter(df_region_continous_inputs_summary_XX,
grepl("xb", row.names(df_region_continous_inputs_summary_XX))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xb_01 | 161 | 3.35 | 1.66 | 3.25 | -1.00 | 12 | 1.11 | 5.27 | 0.13 |
| xb_02 | 161 | 6.71 | 3.30 | 7.00 | -1.00 | 15 | -0.03 | -0.52 | 0.26 |
| xb_03 | 161 | 0.39 | 2.82 | 0.00 | -6.00 | 12 | 0.82 | 1.51 | 0.22 |
| xb_04 | 161 | 1.11 | 0.54 | 1.10 | -0.33 | 4 | 1.36 | 6.85 | 0.04 |
| xb_05 | 161 | 0.05 | 1.05 | 0.00 | -3.00 | 4 | 0.30 | 1.22 | 0.08 |
| xb_06 | 161 | 2.42 | 1.40 | 2.00 | -0.33 | 7 | 0.69 | 0.35 | 0.11 |
| xb_07 | 161 | 2.03 | 0.71 | 2.00 | 0.00 | 7 | 2.08 | 14.06 | 0.06 |
| xb_08 | 161 | 0.16 | 0.78 | 0.16 | -2.00 | 4 | 0.52 | 4.38 | 0.06 |
kable(filter(df_region_continous_inputs_summary_YY,
grepl("xb", row.names(df_region_continous_inputs_summary_YY))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xb_01 | 222 | 3.19 | 1.64 | 3.23 | -2.0 | 10 | 0.01 | 1.88 | 0.11 |
| xb_02 | 222 | 6.68 | 3.36 | 7.00 | -2.0 | 15 | -0.23 | -0.33 | 0.23 |
| xb_03 | 222 | -0.01 | 2.78 | 0.00 | -7.0 | 10 | 0.56 | 0.45 | 0.19 |
| xb_04 | 222 | 1.04 | 0.50 | 1.05 | -0.5 | 3 | -0.15 | 2.70 | 0.03 |
| xb_05 | 222 | -0.01 | 0.94 | 0.00 | -2.5 | 3 | 0.32 | 0.22 | 0.06 |
| xb_06 | 222 | 2.50 | 1.60 | 2.00 | -0.5 | 9 | 0.90 | 1.41 | 0.11 |
| xb_07 | 222 | 2.00 | 0.63 | 2.00 | 0.0 | 5 | 1.21 | 5.90 | 0.04 |
| xb_08 | 222 | 0.10 | 0.73 | 0.09 | -4.0 | 3 | -0.76 | 4.86 | 0.05 |
kable(filter(df_region_continous_inputs_summary_ZZ,
grepl("xb", row.names(df_region_continous_inputs_summary_ZZ))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xb_01 | 294 | 3.53 | 2.41 | 3.21 | -4 | 14 | 0.45 | 1.43 | 0.14 |
| xb_02 | 294 | 4.52 | 2.86 | 4.50 | -4 | 14 | 0.03 | -0.03 | 0.17 |
| xb_03 | 294 | 2.60 | 2.70 | 2.00 | -4 | 14 | 0.72 | 1.06 | 0.16 |
| xb_04 | 294 | 1.26 | 0.86 | 1.29 | -2 | 5 | 0.51 | 3.17 | 0.05 |
| xb_05 | 294 | 0.92 | 0.96 | 1.00 | -2 | 5 | 0.67 | 2.09 | 0.06 |
| xb_06 | 294 | 1.64 | 1.14 | 1.50 | -2 | 8 | 0.92 | 3.65 | 0.07 |
| xb_07 | 294 | 2.20 | 1.06 | 2.00 | -1 | 6 | 0.42 | 1.59 | 0.06 |
| xb_08 | 294 | 0.32 | 1.17 | 0.50 | -4 | 5 | 0.32 | 1.87 | 0.07 |
There are some differences in the sentiment variable distributions by region. Overall, it appears that the distributions of the variables are more similar between regions XX and YY. Visually, we can tell that their distributions are overlapping in the purple areas of the density plots because region XX is reflected in red and region YY is reflected in blue. In contrast, the distributions of the sentiment variables for region ZZ deviate from the other two, either with mean values that are greater or less than what is shared between regions XX and YY or in their variability (i.e., the width of the distribution).
For example, for xb_01, while the mean sentiment values
are similar across region (the peaks of the density curves are between
3.1 and 3.6), the sentiment values associated with customers in region
ZZ are relatively more variable, with values ranging from -4 to 14,
while regions XX and YY have ranges of -1 to 12 and -2 to 10
respectively. Additionally, for xb_05, the mean sentiment
values of region XX and YY are .05 and -.01 respectively, which is close
to “neutral,” while the mean sentiment value in region ZZ is .92 which
leans more “positive.”
input_names <- df_region_continous_inputs %>% select(starts_with("xb")) %>% colnames()
df_region_continous_inputs %>%
select(region, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "region")) %>%
ggplot(mapping = aes(x = value, fill = region)) +
geom_density(alpha = .33) +
scale_fill_brewer(palette="Set1") +
facet_wrap(~name, scales = "free") +
theme_bw()
input_names <- df_region_continous_inputs %>% select(starts_with("xb")) %>% colnames()
df_region_continous_inputs %>%
select(region, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "region")) %>%
ggplot(mapping = aes(x = value, fill = region)) +
geom_histogram(bins = 25, alpha = .5) +
scale_fill_brewer(palette="Set1") +
facet_wrap(~name, scales = "free") +
theme_bw()
Similar to what we observed above, the distributions of the sentiment
variables are more similar between regions XX and YY, while the
distributions for region ZZ are slightly different from the two.
Specifically, even when the mean sentiment values across regions are
similar, e.g., for variables xn_01 and xn_04,
there is more variability in the values for region ZZ (i.e., the density
curve is wider). Additionally, it seems that the mean sentiment values
for region XX and YY are closer together. For instance, for
xn_05, region XX has a mean of -.50 and region YY has a
mean of -.47 (generally negative sentiment), while region ZZ has value
of .27 (generally positive sentiment).
kable(filter(df_region_continous_inputs_summary_XX,
grepl("xn", row.names(df_region_continous_inputs_summary_XX))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xn_01 | 161 | 1.59 | 1.55 | 1.67 | -2.5 | 10 | 1.07 | 6.29 | 0.12 |
| xn_02 | 161 | 4.62 | 3.04 | 4.00 | -2.0 | 12 | 0.19 | -0.24 | 0.24 |
| xn_03 | 161 | -1.09 | 2.47 | -1.00 | -6.0 | 10 | 0.93 | 2.54 | 0.19 |
| xn_04 | 161 | 0.58 | 0.61 | 0.57 | -1.0 | 3 | 0.48 | 2.98 | 0.05 |
| xn_05 | 161 | -0.50 | 1.07 | -0.50 | -3.0 | 3 | 0.31 | 0.76 | 0.08 |
| xn_06 | 161 | 1.74 | 1.18 | 1.67 | -1.0 | 6 | 0.46 | 0.77 | 0.09 |
| xn_07 | 161 | 1.44 | 0.65 | 1.40 | -1.0 | 4 | 0.10 | 3.00 | 0.05 |
| xn_08 | 161 | -0.33 | 0.80 | -0.38 | -3.0 | 3 | 0.75 | 3.00 | 0.06 |
kable(filter(df_region_continous_inputs_summary_YY,
grepl("xn", row.names(df_region_continous_inputs_summary_YY))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xn_01 | 222 | 1.60 | 1.42 | 1.67 | -3.5 | 6.25 | -0.28 | 2.53 | 0.10 |
| xn_02 | 222 | 4.66 | 2.87 | 5.00 | -3.0 | 13.00 | -0.24 | 0.11 | 0.19 |
| xn_03 | 222 | -1.32 | 2.58 | -2.00 | -7.0 | 6.00 | 0.36 | -0.15 | 0.17 |
| xn_04 | 222 | 0.60 | 0.53 | 0.61 | -2.0 | 3.00 | -0.22 | 4.75 | 0.04 |
| xn_05 | 222 | -0.47 | 0.96 | -0.67 | -3.0 | 3.00 | 0.52 | 0.48 | 0.06 |
| xn_06 | 222 | 1.96 | 1.48 | 1.75 | -2.0 | 7.00 | 0.76 | 0.93 | 0.10 |
| xn_07 | 222 | 1.41 | 0.53 | 1.43 | -2.0 | 3.25 | -1.02 | 8.10 | 0.04 |
| xn_08 | 222 | -0.29 | 0.79 | -0.30 | -3.0 | 3.00 | 0.12 | 2.52 | 0.05 |
kable(filter(df_region_continous_inputs_summary_ZZ,
grepl("xn", row.names(df_region_continous_inputs_summary_ZZ))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xn_01 | 294 | 1.51 | 2.07 | 1.33 | -4 | 9 | 0.25 | 1.16 | 0.12 |
| xn_02 | 294 | 2.39 | 2.47 | 2.00 | -4 | 9 | -0.05 | 0.03 | 0.14 |
| xn_03 | 294 | 0.67 | 2.45 | 1.00 | -5 | 9 | 0.38 | 0.43 | 0.14 |
| xn_04 | 294 | 0.62 | 0.90 | 0.56 | -4 | 5 | 0.31 | 4.79 | 0.05 |
| xn_05 | 294 | 0.27 | 1.06 | 0.20 | -4 | 5 | 0.28 | 2.60 | 0.06 |
| xn_06 | 294 | 0.98 | 1.07 | 1.00 | -4 | 6 | 0.42 | 3.65 | 0.06 |
| xn_07 | 294 | 1.38 | 0.99 | 1.13 | -4 | 5 | -0.19 | 4.31 | 0.06 |
| xn_08 | 294 | -0.21 | 1.24 | 0.00 | -4 | 5 | 0.28 | 1.55 | 0.07 |
input_names <- df_region_continous_inputs %>% select(starts_with("xn")) %>% colnames()
df_region_continous_inputs %>%
select(region, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "region")) %>%
ggplot(mapping = aes(x = value, fill = region)) +
geom_density(alpha = .33) +
scale_fill_brewer(palette="Set1") +
facet_wrap(~name, scales = "free") +
theme_bw()
input_names <- df_region_continous_inputs %>% select(starts_with("xn")) %>% colnames()
df_region_continous_inputs %>%
select(region, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "region")) %>%
ggplot(mapping = aes(x = value, fill = region)) +
geom_histogram(bins = 25, alpha = .5) +
scale_fill_brewer(palette="Set1") +
facet_wrap(~name, scales = "free") +
theme_bw()
Similar to what we observed above, the distributions of the sentiment
variables are more similar between regions XX and YY, while the
distributions for region ZZ are slightly different from the two.
Specifically, even when the mean sentiment values across regions are
similar, e.g., for variables xa_01 and xa_04,
there is more variability in the values for region ZZ (i.e., the density
curve is wider). Additionally, it seems that the mean sentiment values
for region XX and YY are closer together. For instance, for
xa_02, region XX has a mean of 15.26 and region YY has a
mean of 15.51, while region ZZ has value of 10.43.
kable(filter(df_region_continous_inputs_summary_XX,
grepl("xa", row.names(df_region_continous_inputs_summary_XX))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xa_01 | 161 | 8.09 | 3.24 | 8.00 | -2 | 23 | 1.07 | 4.86 | 0.26 |
| xa_02 | 161 | 15.26 | 6.58 | 15.00 | -2 | 32 | -0.05 | -0.38 | 0.52 |
| xa_03 | 161 | 2.20 | 5.26 | 2.00 | -9 | 23 | 1.18 | 2.26 | 0.41 |
| xa_04 | 161 | 2.96 | 1.23 | 2.87 | -2 | 10 | 1.38 | 8.14 | 0.10 |
| xa_05 | 161 | 0.73 | 2.31 | 0.67 | -8 | 10 | 0.15 | 2.27 | 0.18 |
| xa_06 | 161 | 5.93 | 3.29 | 5.50 | -2 | 21 | 1.20 | 2.58 | 0.26 |
| xa_07 | 161 | 4.67 | 1.55 | 4.59 | -2 | 12 | 0.78 | 5.38 | 0.12 |
| xa_08 | 161 | 1.23 | 1.49 | 1.07 | -3 | 10 | 1.62 | 8.36 | 0.12 |
kable(filter(df_region_continous_inputs_summary_YY,
grepl("xa", row.names(df_region_continous_inputs_summary_YY))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xa_01 | 222 | 7.81 | 3.03 | 8.00 | -2 | 17 | -0.34 | 1.62 | 0.20 |
| xa_02 | 222 | 15.51 | 7.46 | 16.00 | -2 | 38 | -0.02 | -0.29 | 0.50 |
| xa_03 | 222 | 1.80 | 4.58 | 1.00 | -9 | 17 | 0.39 | 0.16 | 0.31 |
| xa_04 | 222 | 2.74 | 1.05 | 2.85 | -2 | 7 | -0.26 | 4.52 | 0.07 |
| xa_05 | 222 | 0.54 | 1.97 | 0.50 | -8 | 7 | -0.12 | 1.38 | 0.13 |
| xa_06 | 222 | 6.21 | 4.01 | 5.29 | -2 | 23 | 1.12 | 1.58 | 0.27 |
| xa_07 | 222 | 4.57 | 1.29 | 4.65 | -2 | 11 | -0.01 | 5.44 | 0.09 |
| xa_08 | 222 | 0.94 | 1.51 | 1.00 | -4 | 7 | 0.03 | 2.93 | 0.10 |
kable(filter(df_region_continous_inputs_summary_ZZ,
grepl("xa", row.names(df_region_continous_inputs_summary_ZZ))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xa_01 | 294 | 8.26 | 4.77 | 7.58 | -3.0 | 35 | 1.14 | 3.88 | 0.28 |
| xa_02 | 294 | 10.43 | 5.81 | 10.00 | -3.0 | 35 | 0.49 | 0.71 | 0.34 |
| xa_03 | 294 | 6.27 | 5.54 | 5.00 | -6.0 | 35 | 1.09 | 2.81 | 0.32 |
| xa_04 | 294 | 3.09 | 1.70 | 3.00 | -1.5 | 12 | 1.00 | 3.49 | 0.10 |
| xa_05 | 294 | 2.37 | 1.98 | 2.25 | -3.0 | 12 | 0.86 | 2.79 | 0.12 |
| xa_06 | 294 | 3.92 | 2.26 | 3.67 | -1.5 | 12 | 0.88 | 1.54 | 0.13 |
| xa_07 | 294 | 4.81 | 2.03 | 4.54 | -1.0 | 13 | 0.92 | 2.05 | 0.12 |
| xa_08 | 294 | 1.43 | 2.29 | 1.67 | -5.0 | 12 | 0.50 | 2.61 | 0.13 |
input_names <- df_region_continous_inputs %>% select(starts_with("xa")) %>% colnames()
df_region_continous_inputs %>%
select(region, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "region")) %>%
ggplot(mapping = aes(x = value, fill = region)) +
geom_density(alpha = .33) +
scale_fill_brewer(palette="Set1") +
facet_wrap(~name, scales = "free") +
theme_bw()
input_names <- df_region_continous_inputs %>% select(starts_with("xa")) %>% colnames()
df_region_continous_inputs %>%
select(region, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "region")) %>%
ggplot(mapping = aes(x = value, fill = region)) +
geom_histogram(bins = 25, alpha = .5) +
scale_fill_brewer(palette="Set1") +
facet_wrap(~name, scales = "free") +
theme_bw()
Similar to what we observed above, the distributions of the sentiment
variables are more similar between regions XX and YY, while the
distributions for region ZZ are slightly different from the two.
Specifically, even when the mean sentiment values across regions are
similar, e.g., for variable xw_01, there is more
variability in the values for region ZZ (i.e., the density curve is
wider). Additionally, it seems that the mean sentiment values for region
XX and YY are closer together. For instance, for xw_02,
region XX has a mean of 24.11 and region YY has a mean of 23.04, while
region ZZ has value of 42.78. For xw_03, region XX has a
mean of 87.91 and region YY has a mean of 88.18, while region ZZ has
value of 67.35.
kable(filter(df_region_continous_inputs_summary_XX,
grepl("xw", row.names(df_region_continous_inputs_summary_XX))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xw_01 | 161 | 58.31 | 17.03 | 58.55 | 10.5 | 108 | 0.07 | 0.94 | 1.34 |
| xw_02 | 161 | 24.11 | 26.15 | 16.00 | 0.0 | 108 | 1.40 | 1.41 | 2.06 |
| xw_03 | 161 | 87.91 | 23.12 | 98.00 | 14.0 | 110 | -1.56 | 1.52 | 1.82 |
kable(filter(df_region_continous_inputs_summary_YY,
grepl("xw", row.names(df_region_continous_inputs_summary_YY))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xw_01 | 222 | 58.58 | 16.77 | 59.01 | 11 | 103 | -0.22 | 0.67 | 1.13 |
| xw_02 | 222 | 23.04 | 26.98 | 14.50 | 0 | 103 | 1.23 | 0.60 | 1.81 |
| xw_03 | 222 | 88.18 | 24.11 | 98.00 | 11 | 113 | -1.65 | 1.71 | 1.62 |
kable(filter(df_region_continous_inputs_summary_ZZ,
grepl("xw", row.names(df_region_continous_inputs_summary_ZZ))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xw_01 | 294 | 55.13 | 23.82 | 53.22 | 9 | 104 | 0.20 | -0.82 | 1.39 |
| xw_02 | 294 | 42.78 | 29.00 | 38.50 | 0 | 104 | 0.47 | -0.76 | 1.69 |
| xw_03 | 294 | 67.35 | 28.15 | 69.00 | 9 | 110 | -0.29 | -1.30 | 1.64 |
input_names <- df_region_continous_inputs %>% select(starts_with("xw")) %>% colnames()
df_region_continous_inputs %>%
select(region, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "region")) %>%
ggplot(mapping = aes(x = value, fill = region)) +
geom_density(alpha = .33) +
scale_fill_brewer(palette="Set1") +
facet_wrap(~name, scales = "free") +
theme_bw()
input_names <- df_region_continous_inputs %>% select(starts_with("xw")) %>% colnames()
df_region_continous_inputs %>%
select(region, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "region")) %>%
ggplot(mapping = aes(x = value, fill = region)) +
geom_histogram(bins = 25, alpha = .5) +
scale_fill_brewer(palette="Set1") +
facet_wrap(~name, scales = "free") +
theme_bw()
sentimentrSimilar to what we observed above, the distributions of the sentiment
variables are more similar between regions XX and YY, while the
distributions for region ZZ are slightly different from the two.
Specifically, even when the mean sentiment values across regions are
similar, e.g., for variables xs_01 and xs_04,
there is more variability in the values for region ZZ (i.e., the density
curve is wider). Additionally, it seems that the mean sentiment values
for region XX and YY are closer together. For instance, for
xs_02, region XX has a mean of -.06 and region YY has a
mean of -.07 (generally negative sentiment), while region ZZ has value
of .14 (generally positive sentiment).
kable(filter(df_region_continous_inputs_summary_XX,
grepl("xs", row.names(df_region_continous_inputs_summary_XX))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xs_01 | 161 | 0.21 | 0.11 | 0.21 | -0.10 | 0.68 | 0.19 | 2.55 | 0.01 |
| xs_02 | 161 | -0.06 | 0.25 | -0.06 | -0.73 | 0.68 | -0.24 | 0.12 | 0.02 |
| xs_03 | 161 | 0.50 | 0.29 | 0.44 | -0.10 | 1.41 | 0.75 | 0.47 | 0.02 |
| xs_04 | 161 | 0.30 | 0.09 | 0.29 | 0.09 | 0.75 | 1.39 | 5.24 | 0.01 |
| xs_05 | 161 | 0.15 | 0.12 | 0.12 | 0.00 | 0.65 | 1.31 | 2.02 | 0.01 |
| xs_06 | 161 | 0.54 | 0.25 | 0.52 | 0.10 | 1.31 | 0.61 | -0.04 | 0.02 |
kable(filter(df_region_continous_inputs_summary_YY,
grepl("xs", row.names(df_region_continous_inputs_summary_YY))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xs_01 | 222 | 0.21 | 0.12 | 0.21 | -0.18 | 0.63 | 0.02 | 1.83 | 0.01 |
| xs_02 | 222 | -0.07 | 0.25 | -0.07 | -0.90 | 0.63 | 0.03 | 0.27 | 0.02 |
| xs_03 | 222 | 0.51 | 0.30 | 0.49 | -0.18 | 1.79 | 0.61 | 1.21 | 0.02 |
| xs_04 | 222 | 0.30 | 0.10 | 0.29 | 0.10 | 0.90 | 1.90 | 7.71 | 0.01 |
| xs_05 | 222 | 0.14 | 0.14 | 0.11 | 0.00 | 0.90 | 1.83 | 4.65 | 0.01 |
| xs_06 | 222 | 0.53 | 0.22 | 0.52 | 0.10 | 1.18 | 0.41 | -0.09 | 0.01 |
kable(filter(df_region_continous_inputs_summary_ZZ,
grepl("xs", row.names(df_region_continous_inputs_summary_ZZ))),
digits = 2)
| n | mean | sd | median | min | max | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|
| xs_01 | 294 | 0.22 | 0.16 | 0.24 | -0.36 | 0.75 | -0.15 | 1.26 | 0.01 |
| xs_02 | 294 | 0.14 | 0.19 | 0.14 | -0.46 | 0.69 | -0.24 | 0.48 | 0.01 |
| xs_03 | 294 | 0.32 | 0.23 | 0.29 | -0.36 | 1.28 | 0.62 | 1.88 | 0.01 |
| xs_04 | 294 | 0.30 | 0.12 | 0.29 | 0.00 | 0.69 | 0.51 | 0.55 | 0.01 |
| xs_05 | 294 | 0.25 | 0.14 | 0.22 | 0.00 | 0.69 | 0.72 | 0.26 | 0.01 |
| xs_06 | 294 | 0.37 | 0.17 | 0.35 | 0.00 | 1.23 | 0.79 | 1.58 | 0.01 |
input_names <- df_region_continous_inputs %>% select(starts_with("xs")) %>% colnames()
df_region_continous_inputs %>%
select(region, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "region")) %>%
ggplot(mapping = aes(x = value, fill = region)) +
geom_density(alpha = .33) +
scale_fill_brewer(palette="Set1") +
facet_wrap(~name, scales = "free") +
theme_bw()
input_names <- df_region_continous_inputs %>% select(starts_with("xs")) %>% colnames()
df_region_continous_inputs %>%
select(region, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "region")) %>%
ggplot(mapping = aes(x = value, fill = region)) +
geom_histogram(bins = 25, alpha = .5) +
scale_fill_brewer(palette="Set1") +
facet_wrap(~name, scales = "free") +
theme_bw()
Overall, it is interesting that regions XX and YY are quite similar (and different from region ZZ) in terms of sentiment value distributions and summary statistics, and that this trend persists across different types of sentiment-derived features or lexicons (e.g., Bing, NRC, etc.). Thus, even though the absolute values of the sentiment variables differ (likely because they are calculated in different ways), we see the similar distributions of word sentiment across region and lexicon, suggesting that these lexicons roughly agree on overall trends in sentiment.
df_customer_continous_inputs <- df %>% dplyr::select(customer, starts_with("x"))
df_customer_continous_inputs_summary <- df_customer_continous_inputs %>%
psych::describeBy(group = "customer")
customer_labels <- c("A", "B", "D", "E", "G", "K", "M", "Other", "Q") #define customer values for later use
#function to produce summary statistics (mean and +/- sd)
data_summary <- function(x) {
m <- mean(x)
ymin <- m-sd(x)
ymax <- m+sd(x)
return(c(y=m,ymin=ymin,ymax=ymax))
}
xb_01In general, average xb_01 sentiment values are similar
across customers, while the variability in sentiment values differs
across customers. For instance, customer G has the widest range of
values from -4 to 14 and customer E has the narrowest range of values
from 1 to 6. This suggests that sales reps’ interactions with customer E
were generally always positive (and the same can be said about customers
B, D, and K).
t <- df_customer_continous_inputs_summary #temp object
n <- "xb_01" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 3.55 | 2.87 | 3.00 | 3.57 | 2.97 | -4.0 | 10.00 | 14.00 | -0.09 | -0.47 | 0.39 |
| B | 52 | 3.05 | 1.03 | 3.04 | 3.06 | 0.42 | 0.0 | 6.33 | 6.33 | -0.01 | 3.80 | 0.14 |
| D | 32 | 3.78 | 1.94 | 3.77 | 3.63 | 0.84 | 1.0 | 12.00 | 11.00 | 2.10 | 7.59 | 0.34 |
| E | 35 | 3.54 | 1.08 | 3.57 | 3.54 | 0.85 | 1.0 | 6.00 | 5.00 | 0.01 | 0.28 | 0.18 |
| G | 113 | 3.51 | 2.51 | 3.38 | 3.45 | 1.65 | -4.0 | 14.00 | 18.00 | 0.54 | 2.61 | 0.24 |
| K | 38 | 3.67 | 2.02 | 4.00 | 3.50 | 1.11 | 0.0 | 11.00 | 11.00 | 1.14 | 2.94 | 0.33 |
| M | 71 | 3.72 | 2.34 | 3.43 | 3.58 | 2.33 | -0.5 | 10.00 | 10.50 | 0.44 | -0.50 | 0.28 |
| Other | 245 | 3.13 | 1.60 | 3.08 | 3.10 | 1.34 | -1.5 | 10.00 | 11.50 | 0.52 | 2.60 | 0.10 |
| Q | 36 | 3.32 | 2.33 | 3.58 | 3.43 | 2.41 | -2.0 | 8.00 | 10.00 | -0.40 | -0.51 | 0.39 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#DE7A98",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#D33C69") + #pink!
ylab("sentiment value")
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(n)` instead of `n` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
xb_02Unlike what we saw for xb_01, the average
xb_01 sentiment values differ across customers, while the
variability in sentiment values (the standard deviation) is similar
across customers. Customer E has the highest mean sentiment value at
9.09 and customer A has the lowest at 4. Across customers, it appears
that the sentiment is overwhelmingly positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xb_02" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 4.00 | 2.98 | 4.0 | 4.09 | 2.97 | -4 | 10 | 14 | -0.33 | -0.51 | 0.40 |
| B | 52 | 7.48 | 3.15 | 8.0 | 7.74 | 2.97 | 0 | 13 | 13 | -0.58 | -0.45 | 0.44 |
| D | 32 | 8.25 | 3.51 | 8.0 | 8.35 | 3.71 | 1 | 15 | 14 | -0.17 | -0.85 | 0.62 |
| E | 35 | 9.09 | 2.99 | 9.0 | 9.24 | 1.48 | 2 | 15 | 13 | -0.43 | 0.17 | 0.51 |
| G | 113 | 4.65 | 2.90 | 5.0 | 4.77 | 2.97 | -4 | 14 | 18 | -0.21 | 0.51 | 0.27 |
| K | 38 | 5.16 | 3.04 | 4.5 | 5.03 | 3.71 | 0 | 11 | 11 | 0.38 | -0.79 | 0.49 |
| M | 71 | 4.82 | 2.81 | 5.0 | 4.75 | 2.97 | 0 | 10 | 10 | 0.10 | -1.14 | 0.33 |
| Other | 245 | 5.99 | 3.14 | 6.0 | 5.98 | 2.97 | -1 | 15 | 16 | 0.08 | -0.53 | 0.20 |
| Q | 36 | 4.72 | 3.09 | 6.0 | 4.87 | 2.97 | -2 | 10 | 12 | -0.51 | -0.72 | 0.51 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#DE7A98",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#D33C69") + #pink!
ylab("sentiment value")
xb_03Average xb_03 sentiment values and their variability
differ across customers. The sentiment values associated with some
customers, like B, D, E, and Other, are generally negative or neutral
(at or lower than 0), while it is generally positive for others (values
greater than 0).
t <- df_customer_continous_inputs_summary #temp object
n <- "xb_03" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 3.13 | 3.07 | 3.0 | 3.13 | 2.97 | -4 | 10 | 14 | 0.02 | -0.67 | 0.41 |
| B | 52 | -1.27 | 2.86 | -2.0 | -1.45 | 2.97 | -6 | 6 | 12 | 0.52 | -0.57 | 0.40 |
| D | 32 | 0.25 | 3.36 | 0.0 | -0.15 | 2.22 | -4 | 12 | 16 | 1.38 | 2.51 | 0.59 |
| E | 35 | -0.74 | 2.24 | -1.0 | -0.76 | 1.48 | -5 | 3 | 8 | 0.34 | -0.82 | 0.38 |
| G | 113 | 2.28 | 2.98 | 2.0 | 2.09 | 2.97 | -4 | 14 | 18 | 0.80 | 1.39 | 0.28 |
| K | 38 | 2.39 | 2.38 | 2.0 | 2.19 | 1.48 | -1 | 11 | 12 | 1.34 | 2.90 | 0.39 |
| M | 71 | 2.76 | 2.47 | 2.0 | 2.54 | 1.48 | -2 | 10 | 12 | 0.75 | 0.37 | 0.29 |
| Other | 245 | 0.51 | 2.62 | 0.0 | 0.36 | 2.97 | -7 | 10 | 17 | 0.61 | 0.87 | 0.17 |
| Q | 36 | 1.83 | 2.36 | 1.5 | 1.70 | 2.22 | -2 | 8 | 10 | 0.54 | -0.29 | 0.39 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#DE7A98",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#D33C69") + #pink!
ylab("sentiment value")
xb_04Average xb_04 sentiment values appear similar across
customers, while their variability seems to differ across customers. For
instance, customer A’s range of values is relatively wide, from -2 to
3.5, whereas customer D’s range of values is relatively narrow and only
positive, from .22 to 2.
t <- df_customer_continous_inputs_summary #temp object
n <- "xb_04" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 1.14 | 0.98 | 1.00 | 1.14 | 0.99 | -2.00 | 3.5 | 5.50 | -0.24 | 1.23 | 0.13 |
| B | 52 | 1.06 | 0.41 | 1.01 | 1.04 | 0.18 | 0.00 | 3.0 | 3.00 | 1.65 | 9.16 | 0.06 |
| D | 32 | 1.10 | 0.35 | 1.15 | 1.11 | 0.21 | 0.22 | 2.0 | 1.77 | -0.27 | 0.48 | 0.06 |
| E | 35 | 1.18 | 0.37 | 1.20 | 1.20 | 0.18 | 0.25 | 2.0 | 1.75 | -0.57 | 0.90 | 0.06 |
| G | 113 | 1.23 | 0.89 | 1.28 | 1.24 | 0.66 | -1.00 | 4.0 | 5.00 | 0.10 | 1.02 | 0.08 |
| K | 38 | 1.33 | 0.55 | 1.39 | 1.33 | 0.50 | 0.00 | 2.6 | 2.60 | -0.08 | -0.12 | 0.09 |
| M | 71 | 1.25 | 0.80 | 1.25 | 1.21 | 0.62 | -0.12 | 5.0 | 5.12 | 1.42 | 5.23 | 0.09 |
| Other | 245 | 1.09 | 0.61 | 1.04 | 1.07 | 0.37 | -0.50 | 5.0 | 5.50 | 1.98 | 10.96 | 0.04 |
| Q | 36 | 1.11 | 0.73 | 1.25 | 1.14 | 0.44 | -0.50 | 3.0 | 3.50 | -0.29 | 0.41 | 0.12 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#DE7A98",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#D33C69") + #pink!
ylab("sentiment value")
xb_05Mean xb_05 sentiment values and their variability appear
to differ across customers. For instance, on average, customers B, D,
and E have generally negative sentiment values, while others generally
have positive sentiment values. The “Other” group of customers seems to
be associated with generally neutral sentiment.
t <- df_customer_continous_inputs_summary #temp object
n <- "xb_05" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.99 | 1.04 | 1.00 | 0.98 | 1.11 | -2.0 | 3.50 | 5.50 | 0.00 | 0.54 | 0.14 |
| B | 52 | -0.44 | 1.12 | -0.88 | -0.51 | 1.11 | -2.0 | 3.00 | 5.00 | 0.71 | 0.20 | 0.16 |
| D | 32 | -0.04 | 0.87 | 0.00 | -0.10 | 0.82 | -1.5 | 2.00 | 3.50 | 0.41 | -0.68 | 0.15 |
| E | 35 | -0.32 | 0.75 | -0.50 | -0.34 | 0.74 | -2.0 | 1.00 | 3.00 | 0.15 | -0.73 | 0.13 |
| G | 113 | 0.80 | 0.99 | 0.67 | 0.76 | 0.99 | -1.0 | 4.00 | 5.00 | 0.42 | -0.08 | 0.09 |
| K | 38 | 0.83 | 0.73 | 1.00 | 0.84 | 0.74 | -1.0 | 2.25 | 3.25 | -0.31 | -0.19 | 0.12 |
| M | 71 | 0.95 | 0.85 | 1.00 | 0.90 | 0.74 | -1.0 | 5.00 | 6.00 | 1.47 | 5.71 | 0.10 |
| Other | 245 | 0.18 | 1.06 | 0.00 | 0.14 | 0.99 | -3.0 | 5.00 | 8.00 | 0.67 | 2.50 | 0.07 |
| Q | 36 | 0.66 | 0.77 | 1.00 | 0.63 | 0.79 | -0.5 | 3.00 | 3.50 | 0.55 | 0.45 | 0.13 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#DE7A98",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#D33C69") + #pink!
ylab("sentiment value")
xb_06Average xb_06 sentiment values appear to differ across
customers, while their variability appears to be similar across
customers. All average sentiment values are positive, ranging from 1.28
to 3.26, with standard deviations ranging from 1.01 to 1.85.
t <- df_customer_continous_inputs_summary #temp object
n <- "xb_06" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 1.28 | 1.02 | 1.33 | 1.29 | 0.99 | -2.00 | 3.5 | 5.50 | -0.42 | 0.93 | 0.14 |
| B | 52 | 3.26 | 1.85 | 3.00 | 3.18 | 1.67 | 0.00 | 9.0 | 9.00 | 0.55 | 0.09 | 0.26 |
| D | 32 | 2.63 | 1.40 | 2.67 | 2.58 | 1.73 | 0.50 | 5.5 | 5.00 | 0.35 | -0.97 | 0.25 |
| E | 35 | 3.15 | 1.34 | 3.00 | 3.14 | 1.48 | 0.50 | 6.0 | 5.50 | 0.09 | -0.44 | 0.23 |
| G | 113 | 1.74 | 1.33 | 1.50 | 1.66 | 0.74 | -1.00 | 8.0 | 9.00 | 1.06 | 3.58 | 0.13 |
| K | 38 | 1.83 | 1.06 | 1.58 | 1.75 | 0.86 | 0.00 | 5.0 | 5.00 | 0.89 | 0.60 | 0.17 |
| M | 71 | 1.60 | 1.01 | 1.33 | 1.52 | 0.99 | 0.00 | 5.0 | 5.00 | 1.08 | 1.74 | 0.12 |
| Other | 245 | 2.24 | 1.34 | 2.00 | 2.16 | 1.48 | -0.33 | 9.0 | 9.33 | 1.01 | 2.28 | 0.09 |
| Q | 36 | 1.76 | 1.51 | 1.67 | 1.58 | 0.86 | -0.50 | 7.0 | 7.50 | 1.53 | 3.38 | 0.25 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#DE7A98",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#D33C69") + #pink!
ylab("sentiment value")
xb_07Mean xb_07 sentiment values appear similar across
customers, while their variability appears to differ. For instance,
customer A has a mean sentiment value of 2.20, standard deviation of
1.48 (much greater than all other customers), and a range of -1 to 6. In
contrast, customer D has a mean sentiment value of 2.06, a standard
deviation of .42, and a range of 1 to 3. Overall, on average, the
sentiment is positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xb_07" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 2.20 | 1.48 | 2.00 | 2.11 | 1.48 | -1.00 | 6.00 | 7.00 | 0.53 | 0.35 | 0.20 |
| B | 52 | 2.01 | 0.50 | 1.97 | 1.97 | 0.24 | 1.00 | 4.00 | 3.00 | 1.33 | 4.22 | 0.07 |
| D | 32 | 2.06 | 0.42 | 2.14 | 2.09 | 0.22 | 1.00 | 3.00 | 2.00 | -0.57 | 0.15 | 0.07 |
| E | 35 | 2.12 | 0.46 | 2.13 | 2.11 | 0.29 | 0.67 | 3.22 | 2.56 | -0.10 | 1.98 | 0.08 |
| G | 113 | 2.28 | 0.99 | 2.15 | 2.26 | 0.76 | -1.00 | 5.00 | 6.00 | -0.09 | 1.62 | 0.09 |
| K | 38 | 2.30 | 0.89 | 2.29 | 2.23 | 0.43 | 1.00 | 5.00 | 4.00 | 0.68 | 0.80 | 0.14 |
| M | 71 | 2.12 | 0.85 | 2.00 | 2.09 | 0.99 | 0.50 | 5.00 | 4.50 | 0.43 | 0.38 | 0.10 |
| Other | 245 | 1.96 | 0.71 | 2.00 | 1.93 | 0.49 | 0.00 | 7.00 | 7.00 | 1.91 | 11.86 | 0.05 |
| Q | 36 | 2.15 | 1.01 | 2.00 | 2.05 | 0.74 | 0.00 | 5.00 | 5.00 | 0.93 | 1.64 | 0.17 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#DE7A98",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#D33C69") + #pink!
ylab("sentiment value")
xb_08Mean xb_08 sentiment values appear similar across
customers, while their variability differs. Overall, on average, the
sentiment is neutral to positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xb_08" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.07 | 1.16 | 0.00 | 0.02 | 1.48 | -3.0 | 3.00 | 6.00 | 0.08 | -0.22 | 0.16 |
| B | 52 | 0.11 | 0.59 | 0.07 | 0.08 | 0.27 | -1.0 | 3.00 | 4.00 | 1.89 | 9.05 | 0.08 |
| D | 32 | 0.13 | 0.61 | 0.18 | 0.17 | 0.35 | -1.5 | 1.50 | 3.00 | -0.50 | 0.78 | 0.11 |
| E | 35 | 0.24 | 0.50 | 0.32 | 0.26 | 0.34 | -1.5 | 1.50 | 3.00 | -0.85 | 2.96 | 0.08 |
| G | 113 | 0.25 | 1.17 | 0.33 | 0.25 | 0.99 | -4.0 | 4.00 | 8.00 | -0.09 | 1.49 | 0.11 |
| K | 38 | 0.40 | 0.89 | 0.54 | 0.48 | 0.69 | -2.0 | 2.00 | 4.00 | -0.98 | 0.97 | 0.14 |
| M | 71 | 0.33 | 1.22 | 0.33 | 0.25 | 0.99 | -2.0 | 5.00 | 7.00 | 0.69 | 1.48 | 0.15 |
| Other | 245 | 0.19 | 0.86 | 0.16 | 0.17 | 0.55 | -2.0 | 5.00 | 7.00 | 1.27 | 6.26 | 0.06 |
| Q | 36 | 0.23 | 1.09 | 0.50 | 0.37 | 0.74 | -4.0 | 1.67 | 5.67 | -1.78 | 4.11 | 0.18 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#DE7A98",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#D33C69") + #pink!
ylab("sentiment value")
xn_01Mean xn_01 sentiment values and their variability differ
across customers. On average, the sentiment is neutral to positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xn_01" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 1.39 | 2.20 | 1.00 | 1.30 | 1.48 | -4.00 | 8.00 | 12.00 | 0.36 | 0.36 | 0.30 |
| B | 52 | 1.69 | 0.78 | 1.74 | 1.71 | 0.38 | -2.00 | 3.00 | 5.00 | -1.73 | 7.75 | 0.11 |
| D | 32 | 2.37 | 2.02 | 2.43 | 2.30 | 0.84 | -2.00 | 10.00 | 12.00 | 1.18 | 4.84 | 0.36 |
| E | 35 | 2.09 | 0.83 | 2.12 | 2.13 | 0.56 | -0.22 | 3.71 | 3.94 | -0.64 | 1.12 | 0.14 |
| G | 113 | 1.57 | 2.11 | 1.17 | 1.48 | 1.24 | -4.00 | 9.00 | 13.00 | 0.67 | 2.19 | 0.20 |
| K | 38 | 1.04 | 1.48 | 1.00 | 1.06 | 1.48 | -2.00 | 4.00 | 6.00 | -0.11 | -0.64 | 0.24 |
| M | 71 | 1.75 | 2.31 | 2.00 | 1.83 | 1.48 | -4.00 | 7.00 | 11.00 | -0.41 | 0.01 | 0.27 |
| Other | 245 | 1.46 | 1.40 | 1.50 | 1.45 | 0.74 | -3.50 | 8.00 | 11.50 | 0.26 | 3.00 | 0.09 |
| Q | 36 | 1.18 | 2.16 | 1.17 | 1.17 | 1.73 | -3.00 | 6.25 | 9.25 | 0.04 | -0.31 | 0.36 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#F7B065",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#FF8300") + #orange!
ylab("sentiment value")
xn_02Mean xn_02 sentiment values differ across customers
while their variability is relatively similar. On average, the sentiment
is positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xn_02" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 1.85 | 2.34 | 2 | 1.82 | 2.97 | -4 | 8 | 12 | 0.10 | -0.07 | 0.32 |
| B | 52 | 6.06 | 2.83 | 6 | 6.05 | 2.97 | -2 | 12 | 14 | -0.15 | -0.07 | 0.39 |
| D | 32 | 5.91 | 3.29 | 6 | 5.96 | 3.71 | -2 | 11 | 13 | -0.15 | -0.75 | 0.58 |
| E | 35 | 6.63 | 2.43 | 7 | 6.76 | 2.97 | 1 | 10 | 9 | -0.43 | -0.79 | 0.41 |
| G | 113 | 2.74 | 2.57 | 2 | 2.75 | 2.97 | -4 | 9 | 13 | 0.08 | 0.14 | 0.24 |
| K | 38 | 2.13 | 2.22 | 2 | 2.09 | 2.97 | -2 | 7 | 9 | 0.04 | -0.67 | 0.36 |
| M | 71 | 2.63 | 2.71 | 3 | 2.81 | 2.97 | -4 | 7 | 11 | -0.55 | -0.36 | 0.32 |
| Other | 245 | 3.98 | 2.60 | 4 | 4.00 | 2.97 | -2 | 13 | 15 | 0.08 | 0.29 | 0.17 |
| Q | 36 | 2.50 | 3.04 | 3 | 2.43 | 2.97 | -3 | 10 | 13 | 0.12 | -0.30 | 0.51 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#F7B065",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#FF8300") + #orange!
ylab("sentiment value")
xn_03Mean xn_03 sentiment values and their variability differ
across customers. Customer B has the most negative mean sentiment value
at -2.35 while customer A has the most positive mean sentiment at
.91.
t <- df_customer_continous_inputs_summary #temp object
n <- "xn_03" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.91 | 2.54 | 0.0 | 0.87 | 2.97 | -4 | 8 | 12 | 0.32 | -0.27 | 0.34 |
| B | 52 | -2.35 | 2.60 | -3.0 | -2.55 | 2.97 | -6 | 3 | 9 | 0.53 | -0.72 | 0.36 |
| D | 32 | -0.81 | 3.21 | -2.0 | -1.27 | 2.97 | -4 | 10 | 14 | 1.36 | 1.90 | 0.57 |
| E | 35 | -1.66 | 1.94 | -2.0 | -1.62 | 2.97 | -5 | 2 | 7 | -0.03 | -1.03 | 0.33 |
| G | 113 | 0.54 | 2.69 | 0.0 | 0.44 | 2.97 | -5 | 9 | 14 | 0.49 | 0.69 | 0.25 |
| K | 38 | 0.00 | 1.68 | 0.0 | -0.06 | 1.48 | -3 | 4 | 7 | 0.23 | -0.54 | 0.27 |
| M | 71 | 0.79 | 2.53 | 1.0 | 0.72 | 2.97 | -5 | 7 | 12 | 0.18 | -0.35 | 0.30 |
| Other | 245 | -0.93 | 2.49 | -1.0 | -0.99 | 2.97 | -7 | 8 | 15 | 0.33 | 0.37 | 0.16 |
| Q | 36 | -0.17 | 2.13 | -0.5 | -0.27 | 2.22 | -4 | 4 | 8 | 0.37 | -0.72 | 0.36 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#F7B065",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#FF8300") + #orange!
ylab("sentiment value")
xn_04Mean xn_04 sentiment values appear relatively similar
across customers while their variability differs. On average, the
sentiment is positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xn_04" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.50 | 0.85 | 0.44 | 0.46 | 0.83 | -1.00 | 4.00 | 5.00 | 1.08 | 3.39 | 0.11 |
| B | 52 | 0.64 | 0.31 | 0.61 | 0.62 | 0.18 | -0.33 | 2.00 | 2.33 | 1.14 | 6.95 | 0.04 |
| D | 32 | 0.79 | 0.54 | 0.87 | 0.82 | 0.25 | -1.00 | 2.00 | 3.00 | -0.99 | 2.46 | 0.10 |
| E | 35 | 0.79 | 0.28 | 0.80 | 0.81 | 0.18 | -0.13 | 1.50 | 1.63 | -0.74 | 2.48 | 0.05 |
| G | 113 | 0.62 | 0.89 | 0.50 | 0.60 | 0.74 | -2.00 | 5.00 | 7.00 | 1.05 | 5.26 | 0.08 |
| K | 38 | 0.49 | 0.86 | 0.40 | 0.41 | 0.59 | -1.00 | 4.00 | 5.00 | 1.70 | 5.30 | 0.14 |
| M | 71 | 0.63 | 0.96 | 0.75 | 0.68 | 0.59 | -4.00 | 2.83 | 6.83 | -1.52 | 6.08 | 0.11 |
| Other | 245 | 0.58 | 0.62 | 0.54 | 0.56 | 0.39 | -2.00 | 3.00 | 5.00 | 0.49 | 3.53 | 0.04 |
| Q | 36 | 0.53 | 0.83 | 0.48 | 0.49 | 0.74 | -1.00 | 3.00 | 4.00 | 0.50 | 0.54 | 0.14 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#F7B065",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#FF8300") + #orange!
ylab("sentiment value")
xn_05Mean xn_05 sentiment values appear relatively different
across customers while their variability is similar.
t <- df_customer_continous_inputs_summary #temp object
n <- "xn_05" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.30 | 0.98 | 0.00 | 0.26 | 0.99 | -1.5 | 4.0 | 5.5 | 0.79 | 1.84 | 0.13 |
| B | 52 | -0.91 | 1.08 | -1.00 | -0.95 | 0.99 | -3.0 | 2.0 | 5.0 | 0.38 | -0.09 | 0.15 |
| D | 32 | -0.43 | 1.08 | -0.83 | -0.45 | 0.74 | -3.0 | 2.0 | 5.0 | 0.23 | -0.25 | 0.19 |
| E | 35 | -0.76 | 0.92 | -1.00 | -0.76 | 0.99 | -3.0 | 1.0 | 4.0 | -0.11 | -0.48 | 0.15 |
| G | 113 | 0.18 | 1.15 | 0.00 | 0.17 | 0.99 | -3.0 | 5.0 | 8.0 | 0.54 | 2.34 | 0.11 |
| K | 38 | 0.12 | 0.97 | 0.00 | 0.02 | 0.86 | -1.0 | 4.0 | 5.0 | 1.73 | 4.66 | 0.16 |
| M | 71 | 0.22 | 1.05 | 0.33 | 0.29 | 0.99 | -4.0 | 2.5 | 6.5 | -1.22 | 3.31 | 0.13 |
| Other | 245 | -0.32 | 1.02 | -0.50 | -0.37 | 0.74 | -3.0 | 3.0 | 6.0 | 0.60 | 0.59 | 0.07 |
| Q | 36 | 0.02 | 0.92 | -0.17 | -0.08 | 0.80 | -1.0 | 3.0 | 4.0 | 1.12 | 1.19 | 0.15 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#F7B065",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#FF8300") + #orange!
ylab("sentiment value")
xn_06Mean xn_06 sentiment values differ across customers
while their variability is similar. On average, the sentiment is
positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xn_06" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.69 | 0.90 | 0.83 | 0.66 | 0.74 | -1.00 | 4 | 5.00 | 0.76 | 2.27 | 0.12 |
| B | 52 | 2.54 | 1.34 | 2.25 | 2.44 | 1.11 | -0.33 | 6 | 6.33 | 0.54 | -0.16 | 0.19 |
| D | 32 | 2.25 | 1.37 | 2.00 | 2.17 | 1.48 | -1.00 | 6 | 7.00 | 0.47 | 0.66 | 0.24 |
| E | 35 | 2.72 | 1.28 | 2.50 | 2.60 | 0.74 | 0.60 | 6 | 5.40 | 0.78 | 0.47 | 0.22 |
| G | 113 | 1.08 | 1.04 | 1.00 | 1.07 | 0.74 | -2.00 | 5 | 7.00 | 0.33 | 1.72 | 0.10 |
| K | 38 | 0.83 | 0.96 | 1.00 | 0.80 | 1.11 | -1.00 | 4 | 5.00 | 0.62 | 1.29 | 0.16 |
| M | 71 | 0.99 | 1.19 | 1.00 | 0.99 | 0.74 | -4.00 | 6 | 10.00 | -0.02 | 6.49 | 0.14 |
| Other | 245 | 1.65 | 1.28 | 1.50 | 1.55 | 0.74 | -2.00 | 7 | 9.00 | 0.97 | 2.26 | 0.08 |
| Q | 36 | 1.02 | 1.10 | 1.00 | 1.00 | 1.24 | -1.00 | 3 | 4.00 | 0.10 | -0.74 | 0.18 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#F7B065",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#FF8300") + #orange!
ylab("sentiment value")
xn_07Mean xn_07 sentiment values appear relatively similar
across customers while their variability differs. On average, the
sentiment is positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xn_07" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 1.41 | 1.10 | 1.00 | 1.35 | 0.74 | -1.00 | 5.00 | 6.00 | 0.68 | 1.27 | 0.15 |
| B | 52 | 1.53 | 0.36 | 1.44 | 1.48 | 0.23 | 1.00 | 3.00 | 2.00 | 1.73 | 4.27 | 0.05 |
| D | 32 | 1.68 | 0.61 | 1.79 | 1.69 | 0.31 | 0.00 | 3.00 | 3.00 | -0.38 | 1.01 | 0.11 |
| E | 35 | 1.61 | 0.32 | 1.69 | 1.62 | 0.28 | 0.89 | 2.22 | 1.33 | -0.21 | -0.46 | 0.05 |
| G | 113 | 1.39 | 0.98 | 1.20 | 1.36 | 0.44 | -2.00 | 5.00 | 7.00 | 0.26 | 3.04 | 0.09 |
| K | 38 | 1.11 | 0.80 | 1.00 | 1.10 | 0.37 | -1.00 | 4.00 | 5.00 | 0.74 | 3.47 | 0.13 |
| M | 71 | 1.52 | 1.06 | 1.50 | 1.56 | 0.74 | -4.00 | 4.00 | 8.00 | -1.81 | 8.87 | 0.13 |
| Other | 245 | 1.36 | 0.60 | 1.35 | 1.36 | 0.51 | -2.00 | 3.00 | 5.00 | -0.64 | 4.99 | 0.04 |
| Q | 36 | 1.28 | 0.73 | 1.20 | 1.26 | 0.47 | 0.00 | 3.25 | 3.25 | 0.52 | 0.44 | 0.12 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#F7B065",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#FF8300") + #orange!
ylab("sentiment value")
xn_08Mean xn_08 sentiment values appear relatively similar
across customers while their variability differs. On average, the
sentiment is negative.
t <- df_customer_continous_inputs_summary #temp object
n <- "xn_08" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | -0.40 | 1.21 | -0.50 | -0.45 | 0.74 | -3.00 | 4 | 7.00 | 0.77 | 1.54 | 0.16 |
| B | 52 | -0.34 | 0.51 | -0.33 | -0.37 | 0.24 | -1.40 | 2 | 3.40 | 1.86 | 7.54 | 0.07 |
| D | 32 | -0.11 | 0.75 | -0.14 | -0.13 | 0.52 | -2.00 | 2 | 4.00 | 0.31 | 1.08 | 0.13 |
| E | 35 | -0.08 | 0.42 | -0.12 | -0.07 | 0.21 | -1.33 | 1 | 2.33 | -0.39 | 1.90 | 0.07 |
| G | 113 | -0.22 | 1.17 | -0.33 | -0.22 | 0.99 | -4.00 | 5 | 9.00 | 0.59 | 3.19 | 0.11 |
| K | 38 | -0.23 | 1.19 | -0.27 | -0.28 | 1.09 | -3.00 | 4 | 7.00 | 0.92 | 2.81 | 0.19 |
| M | 71 | -0.33 | 1.23 | -0.33 | -0.25 | 0.99 | -4.00 | 2 | 6.00 | -0.54 | 0.11 | 0.15 |
| Other | 245 | -0.27 | 0.90 | -0.36 | -0.31 | 0.54 | -3.00 | 3 | 6.00 | 0.61 | 2.19 | 0.06 |
| Q | 36 | -0.30 | 1.30 | 0.00 | -0.32 | 1.48 | -3.00 | 3 | 6.00 | 0.07 | -0.17 | 0.22 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#F7B065",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#FF8300") + #orange!
ylab("sentiment value")
xa_01Mean xa_01 sentiment values appear relatively similar
across customers while their variability differs. On average, the
sentiment is positive. Customer G looks interesting… they have such a
wide range of sentiment values, as well as an extreme value of 38!
t <- df_customer_continous_inputs_summary #temp object
n <- "xa_01" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 8.68 | 5.24 | 7.00 | 8.67 | 5.93 | -3.0 | 20.00 | 23.00 | 0.07 | -0.87 | 0.71 |
| B | 52 | 7.51 | 1.91 | 7.71 | 7.58 | 1.21 | 2.0 | 12.00 | 10.00 | -0.58 | 1.80 | 0.26 |
| D | 32 | 9.51 | 3.29 | 9.18 | 9.16 | 1.34 | 4.5 | 23.00 | 18.50 | 2.05 | 6.64 | 0.58 |
| E | 35 | 8.44 | 2.16 | 8.67 | 8.48 | 1.44 | 3.5 | 13.33 | 9.83 | -0.15 | 0.16 | 0.37 |
| G | 113 | 8.30 | 5.36 | 7.75 | 7.94 | 4.08 | -3.0 | 35.00 | 38.00 | 1.55 | 5.93 | 0.50 |
| K | 38 | 6.83 | 3.01 | 7.00 | 6.70 | 2.97 | 1.5 | 15.00 | 13.50 | 0.44 | -0.14 | 0.49 |
| M | 71 | 8.83 | 4.56 | 8.00 | 8.45 | 4.45 | 1.0 | 22.00 | 21.00 | 0.73 | 0.26 | 0.54 |
| Other | 245 | 7.75 | 3.10 | 7.96 | 7.70 | 2.00 | -2.0 | 21.00 | 23.00 | 0.51 | 2.96 | 0.20 |
| Q | 36 | 7.66 | 4.29 | 7.43 | 7.94 | 5.09 | -2.0 | 14.00 | 16.00 | -0.49 | -0.58 | 0.72 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#7FD05C",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#53A72E") + #green
ylab("sentiment value")
xa_02Mean xa_02 sentiment values appear to differ across
customers, while their variability appears relatively similar. On
average, the sentiment is positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xa_02" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 9.64 | 5.20 | 11.0 | 9.84 | 5.93 | -3 | 20 | 23 | -0.32 | -0.72 | 0.70 |
| B | 52 | 17.21 | 7.03 | 17.5 | 17.50 | 6.67 | 2 | 32 | 30 | -0.29 | -0.47 | 0.97 |
| D | 32 | 18.56 | 7.36 | 18.0 | 18.38 | 8.15 | 6 | 32 | 26 | 0.07 | -1.03 | 1.30 |
| E | 35 | 19.89 | 6.21 | 21.0 | 20.17 | 5.93 | 7 | 32 | 25 | -0.33 | -0.64 | 1.05 |
| G | 113 | 10.75 | 6.15 | 10.0 | 10.65 | 5.93 | -3 | 35 | 38 | 0.51 | 1.44 | 0.58 |
| K | 38 | 10.16 | 6.19 | 8.5 | 9.75 | 6.67 | 2 | 27 | 25 | 0.62 | -0.41 | 1.00 |
| M | 71 | 11.37 | 5.98 | 10.0 | 11.04 | 5.93 | 1 | 26 | 25 | 0.47 | -0.44 | 0.71 |
| Other | 245 | 14.01 | 6.69 | 14.0 | 14.01 | 7.41 | -2 | 38 | 40 | 0.15 | -0.12 | 0.43 |
| Q | 36 | 11.42 | 6.83 | 12.0 | 11.43 | 7.41 | -2 | 28 | 30 | -0.01 | -0.36 | 1.14 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#7FD05C",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#53A72E") + #green
ylab("sentiment value")
xa_03Mean xa_03 sentiment values and their variability appear
to differ across customers.
t <- df_customer_continous_inputs_summary #temp object
n <- "xa_03" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 7.62 | 6.03 | 7.0 | 7.62 | 7.41 | -6 | 20 | 26 | 0.04 | -0.92 | 0.81 |
| B | 52 | -0.35 | 4.90 | -0.5 | -0.64 | 5.19 | -9 | 12 | 21 | 0.48 | -0.33 | 0.68 |
| D | 32 | 2.62 | 6.20 | 2.5 | 1.81 | 6.67 | -5 | 23 | 28 | 1.31 | 1.71 | 1.10 |
| E | 35 | -0.43 | 3.85 | -1.0 | -0.48 | 2.97 | -9 | 8 | 17 | 0.19 | -0.21 | 0.65 |
| G | 113 | 5.93 | 6.32 | 5.0 | 5.36 | 4.45 | -6 | 35 | 41 | 1.44 | 3.98 | 0.59 |
| K | 38 | 4.26 | 3.33 | 4.0 | 3.97 | 2.97 | -2 | 15 | 17 | 1.08 | 1.50 | 0.54 |
| M | 71 | 6.44 | 5.36 | 6.0 | 6.04 | 4.45 | -5 | 22 | 27 | 0.72 | 0.77 | 0.64 |
| Other | 245 | 2.80 | 4.60 | 2.0 | 2.53 | 4.45 | -8 | 21 | 29 | 0.81 | 1.36 | 0.29 |
| Q | 36 | 4.22 | 4.05 | 4.0 | 4.03 | 4.45 | -2 | 14 | 16 | 0.38 | -0.46 | 0.68 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#7FD05C",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#53A72E") + #green
ylab("sentiment value")
xa_04Mean xa_04 sentiment values and their variability appear
similar across customers. On average, the sentiment is positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xa_04" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 3.08 | 1.64 | 3.00 | 3.04 | 1.48 | -1.50 | 7.00 | 8.50 | 0.05 | 0.46 | 0.22 |
| B | 52 | 2.87 | 1.10 | 2.75 | 2.76 | 0.37 | 0.40 | 7.00 | 6.60 | 1.88 | 6.04 | 0.15 |
| D | 32 | 3.20 | 0.83 | 3.14 | 3.14 | 0.21 | 1.60 | 7.00 | 5.40 | 2.58 | 11.18 | 0.15 |
| E | 35 | 3.03 | 0.72 | 3.06 | 3.04 | 0.28 | 1.08 | 5.33 | 4.25 | 0.10 | 2.62 | 0.12 |
| G | 113 | 3.00 | 1.78 | 3.00 | 2.97 | 1.14 | -2.00 | 10.00 | 12.00 | 0.37 | 1.81 | 0.17 |
| K | 38 | 2.78 | 1.28 | 3.00 | 2.67 | 1.02 | 0.58 | 7.00 | 6.42 | 1.02 | 1.92 | 0.21 |
| M | 71 | 3.09 | 1.68 | 3.00 | 2.93 | 1.48 | 0.50 | 12.00 | 11.50 | 2.10 | 8.92 | 0.20 |
| Other | 245 | 2.87 | 1.27 | 2.78 | 2.78 | 0.79 | -2.00 | 10.00 | 12.00 | 1.56 | 8.09 | 0.08 |
| Q | 36 | 2.79 | 1.57 | 3.05 | 2.83 | 1.39 | -0.67 | 7.00 | 7.67 | -0.11 | 0.38 | 0.26 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#7FD05C",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#53A72E") + #green
ylab("sentiment value")
xa_05Mean xa_05 sentiment values and their variability appear
to differ across customers. On average, the sentiment is positive for
most customers with the exception of customers B and E. Customer D, on
average, has values associated with neutral sentiment.
t <- df_customer_continous_inputs_summary #temp object
n <- "xa_05" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 2.71 | 1.94 | 3.00 | 2.69 | 1.48 | -2.0 | 7 | 9.0 | -0.07 | -0.07 | 0.26 |
| B | 52 | -0.34 | 2.79 | -0.12 | -0.32 | 2.10 | -8.0 | 7 | 15.0 | -0.23 | 1.42 | 0.39 |
| D | 32 | 0.65 | 2.25 | 1.00 | 0.55 | 2.97 | -3.0 | 7 | 10.0 | 0.48 | -0.09 | 0.40 |
| E | 35 | -0.29 | 1.93 | -0.50 | -0.41 | 2.22 | -3.0 | 4 | 7.0 | 0.41 | -0.66 | 0.33 |
| G | 113 | 2.12 | 2.14 | 2.00 | 2.04 | 1.78 | -3.0 | 10 | 13.0 | 0.48 | 1.10 | 0.20 |
| K | 38 | 2.01 | 1.51 | 1.58 | 1.85 | 1.36 | -0.4 | 7 | 7.4 | 1.26 | 2.05 | 0.24 |
| M | 71 | 2.22 | 1.83 | 2.14 | 2.17 | 1.27 | -2.0 | 12 | 14.0 | 1.84 | 9.71 | 0.22 |
| Other | 245 | 1.06 | 2.06 | 1.00 | 1.00 | 1.48 | -4.0 | 10 | 14.0 | 0.74 | 2.45 | 0.13 |
| Q | 36 | 1.64 | 1.67 | 1.50 | 1.57 | 1.83 | -1.0 | 7 | 8.0 | 0.64 | 0.94 | 0.28 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#7FD05C",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#53A72E") + #green
ylab("sentiment value")
xa_06Mean xa_06 sentiment values and their variability appear
to differ across customers. Customers B and “Other” have the widest
(most variable) distributions. On average, the sentiment is
positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xa_06" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 3.41 | 1.81 | 3.67 | 3.37 | 0.99 | -1.50 | 9 | 10.50 | 0.27 | 1.13 | 0.24 |
| B | 52 | 8.09 | 4.55 | 7.25 | 7.75 | 3.89 | 0.40 | 23 | 22.60 | 0.85 | 0.75 | 0.63 |
| D | 32 | 6.58 | 3.25 | 6.60 | 6.23 | 2.97 | 2.00 | 16 | 14.00 | 1.01 | 1.14 | 0.57 |
| E | 35 | 7.65 | 3.39 | 7.00 | 7.34 | 2.97 | 2.67 | 17 | 14.33 | 0.87 | 0.51 | 0.57 |
| G | 113 | 4.06 | 2.52 | 4.00 | 3.92 | 1.98 | -2.00 | 12 | 14.00 | 0.62 | 1.02 | 0.24 |
| K | 38 | 3.64 | 1.88 | 3.83 | 3.52 | 1.48 | 0.67 | 9 | 8.33 | 0.54 | 0.09 | 0.30 |
| M | 71 | 4.10 | 2.40 | 3.50 | 3.83 | 1.85 | 0.50 | 12 | 11.50 | 1.24 | 1.70 | 0.29 |
| Other | 245 | 5.54 | 3.40 | 5.00 | 5.14 | 2.97 | -2.00 | 21 | 23.00 | 1.38 | 2.85 | 0.22 |
| Q | 36 | 4.25 | 2.88 | 4.00 | 4.10 | 1.63 | -0.67 | 14 | 14.67 | 0.92 | 1.80 | 0.48 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#7FD05C",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#53A72E") + #green
ylab("sentiment value")
xa_07Mean xa_07 sentiment values appear similar across
customers, while their variability appears different. On average, the
sentiment is positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xa_07" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 4.94 | 2.41 | 4.50 | 4.86 | 2.22 | -1.00 | 10.00 | 11.00 | 0.31 | -0.13 | 0.32 |
| B | 52 | 4.58 | 1.08 | 4.51 | 4.51 | 0.41 | 2.00 | 9.00 | 7.00 | 1.41 | 4.97 | 0.15 |
| D | 32 | 4.97 | 0.79 | 5.02 | 4.99 | 0.36 | 3.17 | 7.00 | 3.83 | -0.27 | 0.56 | 0.14 |
| E | 35 | 4.76 | 0.83 | 4.81 | 4.75 | 0.48 | 2.67 | 6.67 | 4.00 | 0.17 | 0.48 | 0.14 |
| G | 113 | 4.77 | 2.14 | 4.50 | 4.62 | 1.48 | -2.00 | 13.00 | 15.00 | 0.85 | 3.46 | 0.20 |
| K | 38 | 4.47 | 1.54 | 4.10 | 4.41 | 1.63 | 2.00 | 9.00 | 7.00 | 0.58 | 0.04 | 0.25 |
| M | 71 | 4.83 | 1.98 | 4.67 | 4.64 | 1.98 | 2.00 | 12.00 | 10.00 | 1.15 | 1.95 | 0.23 |
| Other | 245 | 4.54 | 1.47 | 4.50 | 4.46 | 1.06 | -2.00 | 12.00 | 14.00 | 0.83 | 5.00 | 0.09 |
| Q | 36 | 5.02 | 1.92 | 5.00 | 5.05 | 1.48 | 1.00 | 11.00 | 10.00 | 0.24 | 1.21 | 0.32 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#7FD05C",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#53A72E") + #green
ylab("sentiment value")
xa_08Mean xa_08 sentiment values appear similar across
customers, while their variability appears different. On average, the
sentiment is positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xa_08" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 1.29 | 2.28 | 2.00 | 1.31 | 1.48 | -5.0 | 7.0 | 12 | -0.15 | 0.55 | 0.31 |
| B | 52 | 1.08 | 1.46 | 0.98 | 0.96 | 0.36 | -3.0 | 7.0 | 10 | 1.53 | 6.26 | 0.20 |
| D | 32 | 1.48 | 1.47 | 1.37 | 1.48 | 0.55 | -3.0 | 7.0 | 10 | 0.66 | 6.12 | 0.26 |
| E | 35 | 1.30 | 0.97 | 1.34 | 1.31 | 0.46 | -1.5 | 4.5 | 6 | 0.23 | 3.43 | 0.16 |
| G | 113 | 1.24 | 2.15 | 1.33 | 1.25 | 1.98 | -5.0 | 10.0 | 15 | 0.25 | 2.37 | 0.20 |
| K | 38 | 1.29 | 2.13 | 1.12 | 1.34 | 1.30 | -5.0 | 7.0 | 12 | -0.24 | 1.80 | 0.35 |
| M | 71 | 1.43 | 2.29 | 1.64 | 1.31 | 2.02 | -3.0 | 12.0 | 15 | 1.31 | 4.76 | 0.27 |
| Other | 245 | 1.16 | 1.70 | 1.02 | 1.10 | 1.26 | -4.0 | 10.0 | 14 | 1.28 | 6.24 | 0.11 |
| Q | 36 | 0.93 | 2.18 | 1.00 | 0.91 | 1.73 | -4.0 | 7.0 | 11 | 0.12 | 0.41 | 0.36 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#7FD05C",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#53A72E") + #green
ylab("sentiment value")
xw_01Mean xw_01 sentiment values and their variability are
different across customers. Also, the shape of these distributions
differ. For example, customer E’s distribution looks Gaussian-like (it
has a peak), while customer A’s distribution looks “flat” with no clear
peak.
t <- df_customer_continous_inputs_summary #temp object
n <- "xw_01" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 58.14 | 24.18 | 58.50 | 57.69 | 27.43 | 11.00 | 104 | 93.00 | 0.11 | -0.82 | 3.26 |
| B | 52 | 60.50 | 14.72 | 62.19 | 62.05 | 5.59 | 11.00 | 94 | 83.00 | -1.29 | 2.98 | 2.04 |
| D | 32 | 57.89 | 16.38 | 55.60 | 56.75 | 11.12 | 23.00 | 106 | 83.00 | 0.93 | 2.01 | 2.90 |
| E | 35 | 51.07 | 9.51 | 53.15 | 51.47 | 7.00 | 28.00 | 70 | 42.00 | -0.46 | -0.05 | 1.61 |
| G | 113 | 56.19 | 21.83 | 53.74 | 55.50 | 19.66 | 14.00 | 103 | 89.00 | 0.26 | -0.58 | 2.05 |
| K | 38 | 49.09 | 21.22 | 45.50 | 48.27 | 16.31 | 12.00 | 93 | 81.00 | 0.55 | -0.28 | 3.44 |
| M | 71 | 60.04 | 25.78 | 64.00 | 60.66 | 33.36 | 11.75 | 102 | 90.25 | -0.14 | -1.09 | 3.06 |
| Other | 245 | 57.57 | 18.01 | 58.83 | 58.02 | 12.87 | 9.00 | 108 | 99.00 | -0.27 | 0.39 | 1.15 |
| Q | 36 | 56.52 | 24.91 | 53.67 | 56.48 | 29.53 | 14.00 | 98 | 84.00 | 0.11 | -1.23 | 4.15 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#55C6E8",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#1BB1DE") + #blue
ylab("sentiment value")
xw_02Mean xw_02 sentiment values and their variability are
different across customers. Also, the shape of these distributions
differ. For example, customer E’s distribution is skewed to the left
(toward values of 0), while customer A’s distribution looks “flat” with
no clear peak.
t <- df_customer_continous_inputs_summary #temp object
n <- "xw_02" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 52.91 | 27.02 | 51.0 | 52.09 | 31.13 | 7 | 104 | 97 | 0.21 | -0.95 | 3.64 |
| B | 52 | 15.71 | 22.08 | 6.5 | 11.60 | 9.64 | 0 | 94 | 94 | 1.58 | 1.87 | 3.06 |
| D | 32 | 21.56 | 26.37 | 13.0 | 16.42 | 19.27 | 0 | 106 | 106 | 1.86 | 3.26 | 4.66 |
| E | 35 | 7.49 | 13.33 | 0.0 | 4.76 | 0.00 | 0 | 62 | 62 | 2.31 | 5.84 | 2.25 |
| G | 113 | 40.88 | 28.33 | 36.0 | 38.93 | 28.17 | 0 | 103 | 103 | 0.60 | -0.56 | 2.66 |
| K | 38 | 33.89 | 28.10 | 25.0 | 31.66 | 19.27 | 0 | 93 | 93 | 0.80 | -0.51 | 4.56 |
| M | 71 | 45.99 | 31.20 | 40.0 | 45.16 | 35.58 | 0 | 102 | 102 | 0.32 | -1.09 | 3.70 |
| Other | 245 | 25.87 | 26.40 | 18.0 | 21.77 | 26.69 | 0 | 108 | 108 | 1.11 | 0.46 | 1.69 |
| Q | 36 | 38.47 | 30.96 | 32.0 | 36.50 | 26.69 | 0 | 98 | 98 | 0.57 | -0.85 | 5.16 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#55C6E8",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#1BB1DE") + #blue
ylab("sentiment value")
xw_03Mean xw_03 sentiment values and their variability are
different across customers. Also, the shape of these distributions
differ. For example, customer B’s distribution is skewed to the right
(toward relatively large values), while customer A’s distribution looks
“flat” with no clear peak.
t <- df_customer_continous_inputs_summary #temp object
n <- "xw_03" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 63.76 | 26.17 | 66 | 64.38 | 35.58 | 11 | 106 | 95 | -0.15 | -1.05 | 3.53 |
| B | 52 | 93.42 | 24.39 | 103 | 99.33 | 5.93 | 11 | 112 | 101 | -2.14 | 3.59 | 3.38 |
| D | 32 | 91.47 | 23.24 | 103 | 95.92 | 5.19 | 23 | 109 | 86 | -1.55 | 1.18 | 4.11 |
| E | 35 | 95.06 | 18.47 | 100 | 99.00 | 4.45 | 33 | 113 | 80 | -2.20 | 4.01 | 3.12 |
| G | 113 | 71.23 | 27.23 | 82 | 73.35 | 26.69 | 14 | 110 | 96 | -0.50 | -1.18 | 2.56 |
| K | 38 | 66.66 | 26.99 | 65 | 67.97 | 37.81 | 16 | 104 | 88 | -0.32 | -1.17 | 4.38 |
| M | 71 | 71.82 | 29.09 | 84 | 74.26 | 23.72 | 13 | 109 | 96 | -0.54 | -1.21 | 3.45 |
| Other | 245 | 84.31 | 25.84 | 96 | 88.75 | 10.38 | 9 | 110 | 101 | -1.33 | 0.52 | 1.65 |
| Q | 36 | 71.53 | 27.78 | 76 | 73.63 | 32.62 | 14 | 105 | 91 | -0.50 | -1.14 | 4.63 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#55C6E8",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#1BB1DE") + #blue
ylab("sentiment value")
sentimentrxs_01Mean xs_01 sentiment values are similar across
customers, while their variability is different. Overall, the sentiment
is positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xs_01" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.22 | 0.20 | 0.24 | 0.22 | 0.15 | -0.22 | 0.72 | 0.94 | 0.06 | 0.26 | 0.03 |
| B | 52 | 0.19 | 0.10 | 0.19 | 0.19 | 0.04 | -0.14 | 0.49 | 0.63 | -0.47 | 3.06 | 0.01 |
| D | 32 | 0.25 | 0.09 | 0.25 | 0.25 | 0.03 | 0.09 | 0.45 | 0.36 | 0.58 | 0.13 | 0.02 |
| E | 35 | 0.23 | 0.06 | 0.23 | 0.23 | 0.04 | 0.04 | 0.38 | 0.34 | -0.46 | 1.26 | 0.01 |
| G | 113 | 0.22 | 0.16 | 0.25 | 0.23 | 0.13 | -0.36 | 0.58 | 0.94 | -0.91 | 1.83 | 0.01 |
| K | 38 | 0.22 | 0.16 | 0.25 | 0.22 | 0.15 | -0.19 | 0.53 | 0.72 | -0.47 | 0.06 | 0.03 |
| M | 71 | 0.23 | 0.16 | 0.22 | 0.23 | 0.13 | -0.11 | 0.75 | 0.87 | 0.63 | 0.95 | 0.02 |
| Other | 245 | 0.20 | 0.11 | 0.20 | 0.20 | 0.08 | -0.08 | 0.68 | 0.76 | 0.50 | 2.13 | 0.01 |
| Q | 36 | 0.22 | 0.16 | 0.23 | 0.23 | 0.14 | -0.18 | 0.52 | 0.70 | -0.39 | 0.07 | 0.03 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#9262E2",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#793FDA") + #purple
ylab("sentiment value")
xs_02Mean xs_02 sentiment values and their variability are
different across customers. Overall, the sentiment is mixed. For
instance, customers B, D, E, and “Other” have negative sentiment on
average, while the other customers have generally positive sentiment on
average.
t <- df_customer_continous_inputs_summary #temp object
n <- "xs_02" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.17 | 0.19 | 0.17 | 0.18 | 0.19 | -0.23 | 0.67 | 0.90 | 0.00 | 0.40 | 0.03 |
| B | 52 | -0.20 | 0.28 | -0.21 | -0.21 | 0.23 | -0.73 | 0.49 | 1.22 | 0.40 | -0.36 | 0.04 |
| D | 32 | -0.08 | 0.28 | -0.09 | -0.07 | 0.20 | -0.64 | 0.45 | 1.09 | 0.04 | -0.53 | 0.05 |
| E | 35 | -0.21 | 0.27 | -0.19 | -0.20 | 0.28 | -0.90 | 0.27 | 1.17 | -0.46 | -0.16 | 0.05 |
| G | 113 | 0.13 | 0.18 | 0.12 | 0.13 | 0.18 | -0.36 | 0.49 | 0.85 | -0.20 | -0.26 | 0.02 |
| K | 38 | 0.10 | 0.18 | 0.11 | 0.10 | 0.16 | -0.46 | 0.53 | 0.99 | -0.33 | 1.04 | 0.03 |
| M | 71 | 0.14 | 0.18 | 0.13 | 0.13 | 0.19 | -0.24 | 0.69 | 0.93 | 0.33 | 0.23 | 0.02 |
| Other | 245 | -0.02 | 0.23 | -0.02 | -0.03 | 0.25 | -0.59 | 0.68 | 1.27 | 0.19 | -0.20 | 0.01 |
| Q | 36 | 0.09 | 0.19 | 0.10 | 0.08 | 0.15 | -0.37 | 0.52 | 0.90 | 0.23 | 0.11 | 0.03 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#9262E2",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#793FDA") + #purple
ylab("sentiment value")
xs_03Mean xs_03 sentiment values are different across
customers, while their variability is similar. Overall, the sentiment is
positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xs_03" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.27 | 0.25 | 0.26 | 0.26 | 0.17 | -0.22 | 1.21 | 1.43 | 0.94 | 2.43 | 0.03 |
| B | 52 | 0.61 | 0.33 | 0.59 | 0.60 | 0.33 | -0.13 | 1.41 | 1.54 | 0.24 | -0.04 | 0.05 |
| D | 32 | 0.64 | 0.29 | 0.54 | 0.62 | 0.28 | 0.24 | 1.20 | 0.96 | 0.52 | -1.01 | 0.05 |
| E | 35 | 0.74 | 0.33 | 0.72 | 0.71 | 0.23 | 0.15 | 1.79 | 1.64 | 0.91 | 1.32 | 0.06 |
| G | 113 | 0.33 | 0.25 | 0.32 | 0.32 | 0.21 | -0.36 | 1.28 | 1.64 | 0.54 | 1.78 | 0.02 |
| K | 38 | 0.34 | 0.23 | 0.36 | 0.35 | 0.23 | -0.19 | 0.73 | 0.93 | -0.29 | -0.77 | 0.04 |
| M | 71 | 0.33 | 0.22 | 0.29 | 0.31 | 0.18 | -0.11 | 1.17 | 1.28 | 0.99 | 1.76 | 0.03 |
| Other | 245 | 0.44 | 0.25 | 0.41 | 0.43 | 0.26 | -0.05 | 1.45 | 1.50 | 0.67 | 0.74 | 0.02 |
| Q | 36 | 0.36 | 0.24 | 0.33 | 0.36 | 0.19 | -0.18 | 1.02 | 1.19 | 0.16 | 0.57 | 0.04 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#9262E2",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#793FDA") + #purple
ylab("sentiment value")
xs_04Mean xs_04 sentiment values are similar, while their
variability is different across customers. Overall, the sentiment is
positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xs_04" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.31 | 0.13 | 0.29 | 0.30 | 0.11 | 0.05 | 0.69 | 0.64 | 0.87 | 0.59 | 0.02 |
| B | 52 | 0.28 | 0.06 | 0.28 | 0.28 | 0.04 | 0.12 | 0.54 | 0.42 | 1.17 | 3.93 | 0.01 |
| D | 32 | 0.34 | 0.10 | 0.32 | 0.32 | 0.05 | 0.14 | 0.65 | 0.50 | 1.44 | 2.70 | 0.02 |
| E | 35 | 0.32 | 0.06 | 0.32 | 0.32 | 0.03 | 0.18 | 0.52 | 0.34 | 0.97 | 3.13 | 0.01 |
| G | 113 | 0.31 | 0.12 | 0.29 | 0.30 | 0.09 | 0.00 | 0.75 | 0.75 | 0.74 | 1.34 | 0.01 |
| K | 38 | 0.30 | 0.13 | 0.30 | 0.30 | 0.12 | 0.03 | 0.60 | 0.57 | 0.07 | 0.02 | 0.02 |
| M | 71 | 0.30 | 0.13 | 0.28 | 0.30 | 0.10 | 0.01 | 0.68 | 0.66 | 0.59 | 0.48 | 0.01 |
| Other | 245 | 0.29 | 0.09 | 0.28 | 0.28 | 0.06 | 0.04 | 0.62 | 0.57 | 0.64 | 1.73 | 0.01 |
| Q | 36 | 0.33 | 0.15 | 0.29 | 0.31 | 0.08 | 0.10 | 0.90 | 0.80 | 1.84 | 4.00 | 0.02 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#9262E2",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#793FDA") + #purple
ylab("sentiment value")
xs_05Mean xs_05 sentiment values and their variability are
different across customers. Overall, the sentiment is positive. Some
customers’ distributions have a clear peak near the center, e.g.,
customer Q, while others’ are skewed to the left (toward smaller values;
e.g., customer B).
t <- df_customer_continous_inputs_summary #temp object
n <- "xs_05" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.28 | 0.12 | 0.24 | 0.26 | 0.09 | 0.05 | 0.69 | 0.64 | 1.29 | 1.76 | 0.02 |
| B | 52 | 0.09 | 0.12 | 0.05 | 0.07 | 0.07 | 0.00 | 0.54 | 0.54 | 2.00 | 3.93 | 0.02 |
| D | 32 | 0.16 | 0.16 | 0.09 | 0.13 | 0.11 | 0.00 | 0.65 | 0.64 | 1.51 | 1.49 | 0.03 |
| E | 35 | 0.09 | 0.10 | 0.06 | 0.07 | 0.07 | 0.00 | 0.44 | 0.44 | 1.88 | 3.26 | 0.02 |
| G | 113 | 0.24 | 0.14 | 0.22 | 0.23 | 0.13 | 0.00 | 0.60 | 0.60 | 0.61 | -0.25 | 0.01 |
| K | 38 | 0.24 | 0.15 | 0.22 | 0.23 | 0.16 | 0.03 | 0.60 | 0.57 | 0.69 | -0.29 | 0.02 |
| M | 71 | 0.24 | 0.15 | 0.20 | 0.22 | 0.13 | 0.01 | 0.68 | 0.66 | 0.81 | 0.21 | 0.02 |
| Other | 245 | 0.15 | 0.11 | 0.13 | 0.14 | 0.10 | 0.00 | 0.56 | 0.56 | 1.04 | 0.65 | 0.01 |
| Q | 36 | 0.24 | 0.18 | 0.22 | 0.22 | 0.13 | 0.03 | 0.90 | 0.87 | 1.75 | 3.76 | 0.03 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#9262E2",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#793FDA") + #purple
ylab("sentiment value")
xs_06Mean xs_06 sentiment values and their variability are
different across customers. Overall, the sentiment is positive.
t <- df_customer_continous_inputs_summary #temp object
n <- "xs_06" #which row
df_customer_continous_inputs_summary_xb_tn <- rbind(t$A[n,], t$B[n,], t$D[n,],
t$E[n,], t$G[n,], t$K[n,],
t$M[n,], t$Other[n,], t$Q[n,]) %>%
select(-vars)
row.names(df_customer_continous_inputs_summary_xb_tn) = customer_labels
kable(df_customer_continous_inputs_summary_xb_tn, digits = 2)
| n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 55 | 0.35 | 0.19 | 0.30 | 0.32 | 0.14 | 0.05 | 1.23 | 1.18 | 1.96 | 6.07 | 0.03 |
| B | 52 | 0.62 | 0.24 | 0.61 | 0.61 | 0.26 | 0.12 | 1.24 | 1.12 | 0.14 | -0.32 | 0.03 |
| D | 32 | 0.67 | 0.28 | 0.63 | 0.65 | 0.28 | 0.17 | 1.31 | 1.14 | 0.47 | -0.57 | 0.05 |
| E | 35 | 0.70 | 0.21 | 0.68 | 0.70 | 0.22 | 0.28 | 1.17 | 0.89 | 0.14 | -0.75 | 0.04 |
| G | 113 | 0.39 | 0.17 | 0.38 | 0.38 | 0.15 | 0.00 | 1.27 | 1.27 | 1.38 | 5.00 | 0.02 |
| K | 38 | 0.39 | 0.20 | 0.37 | 0.39 | 0.18 | 0.03 | 0.88 | 0.85 | 0.26 | -0.39 | 0.03 |
| M | 71 | 0.38 | 0.17 | 0.39 | 0.37 | 0.18 | 0.01 | 0.90 | 0.89 | 0.43 | -0.01 | 0.02 |
| Other | 245 | 0.48 | 0.21 | 0.46 | 0.47 | 0.21 | 0.04 | 1.15 | 1.10 | 0.42 | -0.21 | 0.01 |
| Q | 36 | 0.44 | 0.19 | 0.40 | 0.43 | 0.18 | 0.10 | 0.90 | 0.80 | 0.65 | -0.41 | 0.03 |
df %>%
select(customer, n) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "customer")) %>%
ggplot(mapping = aes(x = customer,
y = value)) +
geom_violin() +
geom_jitter(shape=16,
position=position_jitter(0.2),
color = "#9262E2",
alpha = .33) +
stat_summary(fun.data=data_summary, #display mean, and +/- 1 sd
geom = "pointrange",
color = "#793FDA") + #purple
ylab("sentiment value")
Within and across lexicons or types of sentiment-derived features, (e.g., Bing, NRC, etc.) the sentiment variables are generally positively or negatively correlated with each other. There are more positive than negative correlations.
corr_inputs <- df[,order(names(df))] %>% #reorder for my brain
select(starts_with("x")) %>%
cor() %>%
as.data.frame() %>%
round(digits = 2) #round numbers to 2 decimal places
corr_inputs[lower.tri(corr_inputs)] <- "-"
kable(corr_inputs, digits = 2)
| xa_01 | xa_02 | xa_03 | xa_04 | xa_05 | xa_06 | xa_07 | xa_08 | xb_01 | xb_02 | xb_03 | xb_04 | xb_05 | xb_06 | xb_07 | xb_08 | xn_01 | xn_02 | xn_03 | xn_04 | xn_05 | xn_06 | xn_07 | xn_08 | xs_01 | xs_02 | xs_03 | xs_04 | xs_05 | xs_06 | xw_01 | xw_02 | xw_03 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| xa_01 | 1 | 0.6 | 0.66 | 0.57 | 0.38 | 0.24 | 0.69 | 0.31 | 0.81 | 0.52 | 0.53 | 0.48 | 0.33 | 0.22 | 0.54 | 0.24 | 0.55 | 0.35 | 0.35 | 0.31 | 0.21 | 0.17 | 0.39 | 0.12 | 0.4 | 0.23 | 0.19 | 0.1 | 0.09 | 0.05 | 0.28 | 0.16 | 0.20 |
| xa_02 | - | 1 | -0.14 | 0.3 | -0.31 | 0.67 | 0.39 | 0.13 | 0.48 | 0.87 | -0.21 | 0.25 | -0.32 | 0.64 | 0.29 | 0.11 | 0.36 | 0.76 | -0.28 | 0.18 | -0.35 | 0.6 | 0.24 | 0.06 | 0.23 | -0.41 | 0.64 | 0 | -0.44 | 0.53 | 0.19 | -0.39 | 0.57 |
| xa_03 | - | - | 1 | 0.44 | 0.81 | -0.31 | 0.48 | 0.28 | 0.55 | -0.16 | 0.86 | 0.36 | 0.73 | -0.32 | 0.39 | 0.2 | 0.34 | -0.26 | 0.69 | 0.21 | 0.59 | -0.33 | 0.26 | 0.09 | 0.29 | 0.67 | -0.34 | 0.11 | 0.53 | -0.44 | 0.13 | 0.55 | -0.31 |
| xa_04 | - | - | - | 1 | 0.63 | 0.47 | 0.75 | 0.85 | 0.44 | 0.23 | 0.34 | 0.7 | 0.47 | 0.36 | 0.47 | 0.6 | 0.31 | 0.15 | 0.24 | 0.45 | 0.3 | 0.27 | 0.3 | 0.43 | 0.45 | 0.26 | 0.22 | 0.04 | 0.06 | -0.01 | -0.2 | -0.1 | -0.16 |
| xa_05 | - | - | - | - | 1 | -0.27 | 0.47 | 0.55 | 0.3 | -0.33 | 0.71 | 0.45 | 0.81 | -0.3 | 0.32 | 0.38 | 0.17 | -0.38 | 0.61 | 0.27 | 0.65 | -0.32 | 0.18 | 0.26 | 0.27 | 0.69 | -0.38 | 0.07 | 0.5 | -0.49 | -0.14 | 0.38 | -0.52 |
| xa_06 | - | - | - | - | - | 1 | 0.35 | 0.38 | 0.17 | 0.6 | -0.37 | 0.32 | -0.32 | 0.84 | 0.2 | 0.28 | 0.16 | 0.59 | -0.39 | 0.22 | -0.36 | 0.73 | 0.16 | 0.2 | 0.21 | -0.42 | 0.66 | -0.04 | -0.45 | 0.52 | -0.07 | -0.51 | 0.37 |
| xa_07 | - | - | - | - | - | - | 1 | 0.34 | 0.52 | 0.3 | 0.36 | 0.56 | 0.37 | 0.29 | 0.69 | 0.25 | 0.33 | 0.2 | 0.22 | 0.35 | 0.24 | 0.21 | 0.41 | 0.2 | 0.37 | 0.22 | 0.18 | 0.18 | 0.14 | 0.09 | 0.09 | 0.06 | 0.05 |
| xa_08 | - | - | - | - | - | - | - | 1 | 0.24 | 0.09 | 0.22 | 0.57 | 0.39 | 0.28 | 0.18 | 0.66 | 0.19 | 0.06 | 0.18 | 0.37 | 0.26 | 0.2 | 0.12 | 0.46 | 0.36 | 0.22 | 0.16 | -0.06 | 0.01 | -0.08 | -0.36 | -0.18 | -0.29 |
| xb_01 | - | - | - | - | - | - | - | - | 1 | 0.62 | 0.66 | 0.67 | 0.46 | 0.3 | 0.69 | 0.42 | 0.55 | 0.33 | 0.36 | 0.32 | 0.23 | 0.16 | 0.38 | 0.15 | 0.48 | 0.28 | 0.22 | 0.09 | 0.1 | 0.03 | 0.24 | 0.15 | 0.16 |
| xb_02 | - | - | - | - | - | - | - | - | - | 1 | -0.11 | 0.38 | -0.23 | 0.69 | 0.4 | 0.22 | 0.38 | 0.74 | -0.25 | 0.19 | -0.33 | 0.58 | 0.24 | 0.08 | 0.28 | -0.36 | 0.64 | 0 | -0.42 | 0.52 | 0.21 | -0.37 | 0.57 |
| xb_03 | - | - | - | - | - | - | - | - | - | - | 1 | 0.48 | 0.86 | -0.28 | 0.48 | 0.31 | 0.33 | -0.28 | 0.72 | 0.22 | 0.61 | -0.35 | 0.24 | 0.12 | 0.34 | 0.71 | -0.33 | 0.12 | 0.53 | -0.45 | 0.09 | 0.55 | -0.35 |
| xb_04 | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.65 | 0.52 | 0.72 | 0.83 | 0.35 | 0.18 | 0.26 | 0.46 | 0.31 | 0.25 | 0.32 | 0.42 | 0.55 | 0.31 | 0.28 | 0.05 | 0.07 | 0 | -0.15 | -0.07 | -0.12 |
| xb_05 | - | - | - | - | - | - | - | - | - | - | - | - | 1 | -0.22 | 0.49 | 0.53 | 0.22 | -0.36 | 0.64 | 0.3 | 0.66 | -0.31 | 0.21 | 0.27 | 0.35 | 0.72 | -0.34 | 0.09 | 0.51 | -0.47 | -0.12 | 0.41 | -0.52 |
| xb_06 | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.36 | 0.44 | 0.18 | 0.59 | -0.36 | 0.23 | -0.34 | 0.68 | 0.16 | 0.21 | 0.28 | -0.37 | 0.7 | -0.03 | -0.45 | 0.51 | -0.06 | -0.51 | 0.39 |
| xb_07 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.28 | 0.3 | 0.15 | 0.22 | 0.29 | 0.22 | 0.14 | 0.37 | 0.15 | 0.43 | 0.27 | 0.18 | 0.22 | 0.2 | 0.08 | 0.16 | 0.13 | 0.08 |
| xb_08 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.24 | 0.12 | 0.2 | 0.41 | 0.27 | 0.23 | 0.16 | 0.48 | 0.43 | 0.23 | 0.23 | -0.09 | -0.04 | -0.07 | -0.35 | -0.21 | -0.26 |
| xn_01 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.63 | 0.63 | 0.77 | 0.51 | 0.45 | 0.69 | 0.57 | 0.31 | 0.14 | 0.18 | 0.01 | 0 | 0.04 | 0.1 | 0.03 | 0.08 |
| xn_02 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | -0.14 | 0.45 | -0.2 | 0.78 | 0.44 | 0.32 | 0.18 | -0.43 | 0.6 | -0.04 | -0.47 | 0.51 | 0.11 | -0.43 | 0.50 |
| xn_03 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.53 | 0.87 | -0.2 | 0.44 | 0.4 | 0.22 | 0.61 | -0.38 | 0.05 | 0.46 | -0.46 | 0.01 | 0.48 | -0.40 |
| xn_04 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.67 | 0.58 | 0.77 | 0.85 | 0.34 | 0.17 | 0.18 | -0.01 | 0 | 0 | -0.13 | -0.09 | -0.10 |
| xn_05 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | -0.12 | 0.51 | 0.56 | 0.21 | 0.6 | -0.38 | 0.03 | 0.45 | -0.47 | -0.09 | 0.39 | -0.48 |
| xn_06 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.46 | 0.48 | 0.2 | -0.41 | 0.62 | -0.05 | -0.46 | 0.49 | -0.06 | -0.49 | 0.36 |
| xn_07 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.39 | 0.22 | 0.11 | 0.12 | 0.02 | 0.02 | 0.03 | 0.17 | 0.09 | 0.12 |
| xn_08 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.31 | 0.15 | 0.17 | -0.02 | 0 | -0.02 | -0.35 | -0.22 | -0.26 |
| xs_01 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.54 | 0.53 | 0.16 | 0.09 | 0.1 | -0.22 | -0.12 | -0.17 |
| xs_02 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | -0.33 | 0.1 | 0.55 | -0.51 | -0.08 | 0.47 | -0.53 |
| xs_03 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.06 | -0.44 | 0.61 | -0.15 | -0.59 | 0.35 |
| xs_04 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.68 | 0.53 | -0.12 | -0.02 | -0.14 |
| xs_05 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | -0.18 | -0.06 | 0.45 | -0.49 |
| xs_06 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | -0.06 | -0.51 | 0.37 |
| xw_01 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | 0.62 | 0.71 |
| xw_02 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1 | -0.06 |
| xw_03 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1.00 |
df[,order(names(df))] %>% #reorder for my brain
select(starts_with("x")) %>%
cor() %>%
corrplot::corrplot(type = "upper", method = "square")
response,log(response)For the most part, the sentiment-derived variables are positively associated with the response and log-transformed response variables.
corr_inputs_response <- df[,order(names(df))] %>% #reorder for my brain
mutate(log_response = log(response)) %>%
select(response, log_response, starts_with("x")) %>%
cor() %>%
as.data.frame() %>%
round(digits = 2) %>% #round numbers to 2 decimal places
select(response, log_response)
kable(corr_inputs_response, digits = 2)
| response | log_response | |
|---|---|---|
| response | 1.00 | 0.90 |
| log_response | 0.90 | 1.00 |
| xa_01 | 0.35 | 0.40 |
| xa_02 | 0.28 | 0.40 |
| xa_03 | 0.18 | 0.13 |
| xa_04 | 0.10 | 0.08 |
| xa_05 | 0.04 | -0.04 |
| xa_06 | 0.07 | 0.15 |
| xa_07 | 0.27 | 0.30 |
| xa_08 | -0.06 | -0.11 |
| xb_01 | 0.38 | 0.39 |
| xb_02 | 0.31 | 0.41 |
| xb_03 | 0.19 | 0.11 |
| xb_04 | 0.14 | 0.10 |
| xb_05 | 0.08 | -0.02 |
| xb_06 | 0.09 | 0.15 |
| xb_07 | 0.31 | 0.31 |
| xb_08 | -0.02 | -0.09 |
| xn_01 | 0.38 | 0.41 |
| xn_02 | 0.28 | 0.40 |
| xn_03 | 0.19 | 0.13 |
| xn_04 | 0.28 | 0.28 |
| xn_05 | 0.18 | 0.11 |
| xn_06 | 0.20 | 0.28 |
| xn_07 | 0.38 | 0.43 |
| xn_08 | 0.09 | 0.04 |
| xs_01 | 0.03 | -0.01 |
| xs_02 | -0.01 | -0.12 |
| xs_03 | 0.03 | 0.08 |
| xs_04 | 0.02 | -0.01 |
| xs_05 | 0.01 | -0.08 |
| xs_06 | 0.04 | 0.11 |
| xw_01 | 0.44 | 0.54 |
| xw_02 | 0.25 | 0.22 |
| xw_03 | 0.31 | 0.45 |
df[,order(names(df))] %>% #reorder for my brain
mutate(log_response = log(response)) %>%
select(response, log_response, starts_with("x")) %>%
cor() %>%
corrplot::corrplot(type = "upper", method = "square")
#we really just need to look at the top two rows here but if we wanted to, we could actually just combine the last two sections or something?
responseOverall, there aren’t any clear trends.
input_names <- df %>% select(starts_with("xb")) %>% colnames()
df %>%
select(response, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "response")) %>%
ggplot(mapping = aes(x = value, y = response)) +
geom_point(alpha = .33) +
facet_wrap(~name, scales = "free") +
theme_bw()
log(response)For the most part, there aren’t any clear trends. However,
xb_01, xb_02, and xb_03 appear to
be positively related to the log-transformed response.
input_names <- df %>% select(starts_with("xb")) %>% colnames()
df %>%
select(response, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "response")) %>%
ggplot(mapping = aes(x = value, y = log(response))) +
geom_point(alpha = .33) +
facet_wrap(~name, scales = "free") +
theme_bw()
responseOverall, there aren’t any clear trends.
input_names <- df %>% select(starts_with("xn")) %>% colnames()
df %>%
select(response, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "response")) %>%
ggplot(mapping = aes(x = value, y = response)) +
geom_point(alpha = .33) +
facet_wrap(~name, scales = "free") +
theme_bw()
log(response)For the most part, there aren’t any clear trends. However,
xn_01, xn_02, and xn_07 appear to
be positively related to the log-transformed response.
input_names <- df %>% select(starts_with("xn")) %>% colnames()
df %>%
select(response, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "response")) %>%
ggplot(mapping = aes(x = value, y = log(response))) +
geom_point(alpha = .33) +
facet_wrap(~name, scales = "free") +
theme_bw()
responseOverall, there doesn’t seem to be any clear trends.
input_names <- df %>% select(starts_with("xa")) %>% colnames()
df %>%
select(response, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "response")) %>%
ggplot(mapping = aes(x = value, y = response)) +
geom_point(alpha = .33) +
facet_wrap(~name, scales = "free") +
theme_bw()
log(response)For the most part, there aren’t any clear trends. However,
xa_01 and xa_02 appear to be positively
related to the log-transformed response.
input_names <- df %>% select(starts_with("xa")) %>% colnames()
df %>%
select(response, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "response")) %>%
ggplot(mapping = aes(x = value, y = log(response))) +
geom_point(alpha = .33) +
facet_wrap(~name, scales = "free") +
theme_bw()
responseOverall, there doesn’t seem to be any clear trends.
input_names <- df %>% select(starts_with("xw")) %>% colnames()
df %>%
select(response, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "response")) %>%
ggplot(mapping = aes(x = value, y = response)) +
geom_point(alpha = .33) +
facet_wrap(~name, scales = "free") +
theme_bw()
log(response)There seems to be a positive trend between xw_01 and the
log-transformed response.
input_names <- df %>% select(starts_with("xw")) %>% colnames()
df %>%
select(response, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "response")) %>%
ggplot(mapping = aes(x = value, y = log(response))) +
geom_point(alpha = .33) +
facet_wrap(~name, scales = "free") +
theme_bw()
sentimentr x responseOverall, there doesn’t seem to be any clear trends.
input_names <- df %>% select(starts_with("xb")) %>% colnames()
df %>%
select(response, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "response")) %>%
ggplot(mapping = aes(x = value, y = response)) +
geom_point(alpha = .33) +
facet_wrap(~name, scales = "free") +
theme_bw()
sentimentr x log(response)For the most part, there aren’t any clear trends. However,
xs_01, xs_02, and xs_03 appear to
be positively related to the log-transformed response.
input_names <- df %>% select(starts_with("xb")) %>% colnames()
df %>%
select(response, all_of(input_names)) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "response")) %>%
ggplot(mapping = aes(x = value, y = log(response))) +
geom_point(alpha = .33) +
facet_wrap(~name, scales = "free") +
theme_bw()
outcomeoutcomeIt appears that the value of the sentiment-derived features do not differ by the binary outcome. This is suggested by their overlapping distributions and similar mean values.
df %>%
select(outcome, starts_with("xb")) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "outcome")) %>%
ggplot(mapping = aes(x = name,
y = value,
color = outcome)) +
geom_violin() +
stat_summary(fun.data=mean_sdl,
aes(fill = outcome), fun.args = list(mult = 1), #display mean, and +/- 1 sd
geom="pointrange", color="black",
shape = 16, size = .33,
position = position_dodge(width = 0.9)) +
facet_wrap(~name, scales = "free") +
ylab("sentiment value") +
theme_bw()
outcomeIt appears that the value of the sentiment-derived features do not differ by the binary outcome. This is suggested by their overlapping distributions and similar mean values.
df %>%
select(outcome, starts_with("xn")) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "outcome")) %>%
ggplot(mapping = aes(x = name,
y = value,
color = outcome)) +
geom_violin() +
stat_summary(fun.data=mean_sdl,
aes(fill = outcome), fun.args = list(mult = 1), #display mean, and +/- 1 sd
geom="pointrange", color="black",
shape = 16, size = .33,
position = position_dodge(width = 0.9)) +
facet_wrap(~name, scales = "free") +
ylab("sentiment value") +
theme_bw()
outcomeIt appears that the value of the sentiment-derived features do not differ by the binary outcome. This is suggested by their overlapping distributions and similar mean values.
df %>%
select(outcome, starts_with("xa")) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "outcome")) %>%
ggplot(mapping = aes(x = name,
y = value,
color = outcome)) +
geom_violin() +
stat_summary(fun.data=mean_sdl,
aes(fill = outcome), fun.args = list(mult = 1), #display mean, and +/- 1 sd
geom="pointrange", color="black",
shape = 16, size = .33,
position = position_dodge(width = 0.9)) +
facet_wrap(~name, scales = "free") +
ylab("sentiment value") +
theme_bw()
outcomeIt appears that the value of the sentiment-derived features do not differ by the binary outcome. This is suggested by their overlapping distributions and similar mean values.
df %>%
select(outcome, starts_with("xw")) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "outcome")) %>%
ggplot(mapping = aes(x = name,
y = value,
color = outcome)) +
geom_violin() +
stat_summary(fun.data=mean_sdl,
aes(fill = outcome), fun.args = list(mult = 1), #display mean, and +/- 1 sd
geom="pointrange", color="black",
shape = 16, size = .33,
position = position_dodge(width = 0.9)) +
facet_wrap(~name, scales = "free") +
ylab("sentiment value") +
theme_bw()
sentimentr x outcomeIt appears that the value of the sentiment-derived features do not differ by the binary outcome. This is suggested by their overlapping distributions and similar mean values.
df %>%
select(outcome, starts_with("xs")) %>%
tibble::rowid_to_column() %>%
pivot_longer(!c("rowid", "outcome")) %>%
ggplot(mapping = aes(x = name,
y = value,
color = outcome)) +
geom_violin() +
stat_summary(fun.data=mean_sdl,
aes(fill = outcome), fun.args = list(mult = 1), #display mean, and +/- 1 sd
geom="pointrange", color="black",
shape = 16, size = .33,
position = position_dodge(width = 0.9)) +
facet_wrap(~name, scales = "free") +
ylab("sentiment value") +
theme_bw()